To which degree does the C preprocessor regard integer literal suffixes?

C 2018 6.10.1 deals with conditional inclusion (#if and related statements and the defined operator). Paragraph 1 says:

The expression that controls conditional inclusion shall be an integer constant expression except that: identifiers (including those lexically identical to keywords) are interpreted as described below; and it may contain unary operator expressions of the form

defined identifier

or

defined ( identifier )

Integer constant expression is defined in 6.6 6:

An integer constant expression shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants, sizeof expressions whose results are integer constants, _Alignof expressions, and floating constants that are the immediate operands of casts. Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the sizeof or _Alignof operator.

That paragraph is for C generally, not just the preprocessor. So the expressions that can appear in #if statements are the same as the integer constant expressions that can appear generally in C. However, as stated in the quote above, sizeof and _Alignof are just identifiers; they are not recognized as C operators. In particular, 6.10.1 4 tells us:

… After all replacements due to macro expansion and the defined unary operator have been performed, all remaining identifiers (including those lexically identical to keywords) are replaced with the pp-number 0,…

So, where sizeof or _Alignof appear in a #if expression, it becomes 0. Thus, a #if expression can only have operands that are constants and defined expressions.

Paragraph 4 goes on to say:

… The resulting tokens compose the controlling constant expression which is evaluated according to the rules of 6.6. For the purposes of this token conversion and evaluation, all signed integer types and all unsigned integer types act as if they have the same representation as, respectively, the types intmax_t and uintmax_t defined in the header <stdint.h>.…

6.6 is the section for constant expressions.

So, the compiler will accept integer suffixes in #if expressions, and that does not depend on the C implementation (for the suffixes required in the core C language; implementations could allow extensions). However, all the arithmetic will be performed using intmax_t or uintmax_t, and those do depend on the implementation. If your expressions do not depend on the width of integers above the minimum required1, they should be evaluated the same in any C implementation.

Additionally, paragraph 4 goes on to say there may be some variations with character constants and values, which I omit here as it is not relevant to this question.

Footnote

1intmax_t designates a signed type capable of representing any value of any signed integer type (7.20.1.5 1), and long long int is a signed type that must be at least 64 bits (5.2.4.2.1 1), so any conforming C implementation must provide 64-bit integer arithmetic in the preprocessor.


As I noted in a comment, this is defined in the C standard. Here's the complete text of §6.10.1 ¶4 (and the two footnotes):

C11 §6.10.1 Conditional inclusion

¶4 Prior to evaluation, macro invocations in the list of preprocessing tokens that will become the controlling constant expression are replaced (except for those macro names modified by the defined unary operator), just as in normal text. If the token defined is generated as a result of this replacement process or use of the defined unary operator does not match one of the two specified forms prior to macro replacement, the behavior is undefined. After all replacements due to macro expansion and the defined unary operator have been performed, all remaining identifiers (including those lexically identical to keywords) are replaced with the pp-number 0, and then each preprocessing token is converted into a token. The resulting tokens compose the controlling constant expression which is evaluated according to the rules of 6.6. For the purposes of this token conversion and evaluation, all signed integer types and all unsigned integer types act as if they have the same representation as, respectively, the types intmax_t and uintmax_t defined in the header <stdint.h>.167) This includes interpreting character constants, which may involve converting escape sequences into execution character set members. Whether the numeric value for these character constants matches the value obtained when an identical character constant occurs in an expression (other than within a #if or #elif directive) is implementation-defined.168) Also, whether a single-character character constant may have a negative value is implementation-defined.

167 167) Thus, on an implementation where INT_MAX is 0x7FFF and UINT_MAX is 0xFFFF, the constant 0x8000 is signed and positive within a #if expression even though it would be unsigned in translation phase 7.

168 Thus, the constant expression in the following #if directive and if statement is not guaranteed to evaluate to the same value in these two contexts.

#if 'z' - 'a' == 25
if ('z' - 'a' == 25)

Section 6.6 is §6.6 Constant expressions, which details the differences between the full expressions in section §6.5 Expressions and constant expressions.

In effect, the preprocessor largely ignores the suffixes. Hexadecimal constants are unsigned. The results you show are to be expected on a machine where intmax_t and uintmax_t are 64-bit quantities. If the limits on the intmax_t and uintmax_t were larger, some of the expressions might change.


  1. To which degree does the preprocessor regard integer literal suffixes? Or does it just ignore them?

The type suffixes of integer constants are not inherently meaningful to the preprocessor, but they are an inherent part of the corresponding preprocessing tokens, not separate. The standard has this to say about them:

A preprocessing number begins with a digit optionally preceded by a period (.) and may be followed by valid identifier characters and the character sequences e+, e-, E+, E-, p+, p-, P+, or P-.

Preprocessing number tokens lexically include all floating and integer constant tokens.

(C11 6.4.8/2-3; emphasis added)

For the most part, the preprocessor doesn't treat preprocessing tokens of this type any differently than any other. The exception is in the controlling expressions of #if directives, which are evaluated by performing macro expansion, replacing identifiers with 0, and then converting each preprocessing token into a token before evaluating the result according to C rules. Converting to tokens accounts for the type suffixes, yielding bona fide integer constants.

This does not necessarily produce results identical to those you would get from runtime evaluation of the same expressions, however, because

For the purposes of this token conversion and evaluation, all signed integer types and all unsigned integer types act as if they have the same representation as, respectively, the types intmax_t and uintmax_t.

(C2011, 6.10.1/4)

You go on to ask

  1. Are there any dependencies or different behaviors with different environments, e.g. different compilers, C vs. C++, 32 bit vs. 64 bit machine, etc.? I.e., what does the preprocessor's behavior depend on?

The only direct dependency is the implementation's definitions of intmax_t and uintmax_t. These are not directly tied to language choice or machine architecture, though there may be correlations with those.

  1. Where is all that specified/documented?

In the respective languages' language specifications, of course. I've cited the two of the more relevant sections of the C11 specification, and linked you to a late draft of that standard. (The current C is C18, but it hasn't changed in any of these regards.)


TLDR dumbed down version:

l and ll are effectively (not literally!) ignored by the preprocessor conditionals (basically, everything is treated as if it had a ll suffix), however u is considered (normally, as for every C integer constant)!

After reading all the marvelous answers, I created some more examples that reveal some expected but yet interesting behavior:

#include <stdio.h>

int main()
{
#if (1 - 2u > 0) // If one operand is unsigned, the result is unsigned.
                 // Usual implicit type conversion.
  printf("1 - 2u > 0\n");
#endif

#if (0 < 0xFFFFFFFFFFFFFFFF)
  printf("0 < 0xFFFFFFFFFFFFFFFF\n");
#endif

#if (-1 < 0)
  printf("-1 < 0\n");
#endif

#if (-1 < 0xFFFFFFFFFFFFFFFF)
  printf("-1 < 0xFFFFFFFFFFFFFFFF\n"); // nope
#elif (-1 > 0xFFFFFFFFFFFFFFFF)
  printf("-1 > 0xFFFFFFFFFFFFFFFF\n"); // nope, obviously
#endif

#if (-1 == 0xFFFFFFFFFFFFFFFF)
  printf("-1 == 0xFFFFFFFFFFFFFFFF (!!!)\n");
#endif
}

With this output:

1 - 2u > 0
0 < 0xFFFFFFFFFFFFFFFF
-1 < 0
-1 == 0xFFFFFFFFFFFFFFFF (!!!)