What is the rule for C to cast between short and int?

Anytime an integer type is being converted to a different integer type it falls through a deterministic pachinko machine of rules as dictated by the standard and on one occasion, the implementation.

The general overview on value-qualification:

C99 6.3.1.1-p2

If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.

That said, lets look at your conversions. The signed-short to unsigned int is covered by the following, since the value being converted falls outside the unsigned int domain:

C99 6.3.1.3-p2

Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

Which basically means "add UINT_MAX+1". On your machine, UINT_MAX is 4294967295, therefore, this becomes

-1 + 4294967295 + 1 = 4294967295

Regarding your unsigned short to signed int conversion, that is covered by the regular value-quaified promotion. Specifically:

C99 6.3.1.3-p1

When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

In other words, because the value of your unsigned short falls within the coverable domain of signed int, there is nothing special done and the value is simply saved.

And finally, as mentioned in general-comment above, something special happens to your declaration of b

signed short b = 0xFFFF;

The 0xFFFF in this case is a signed integer. The decimal value is 65535. However, that value is not representable by a signed short so yet-another conversion happens, one that perhaps you weren't aware of:

C99 6.3.1.3-p3

Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

In other words, your implementation chose to store it as (-1), but you cannot rely on that on a different implementation.

What's happening here is that the right-hand-side of the argument is first extended from 16 to 32 bits, and the conversion to the left-hand-side type only happens at assignment. This means that if the right-hand-side is signed, then it will be sign-extended when it's converted to 32 bits, and likewise if it's unsigned then it will just be zero-padded.

If you're careful with your casts then there shouldn't be any problem—but unless you're doing something super performance-intensive then the extra couple of bitwise operations shouldn't hurt anything.

On another note, if you're doing anything where you're assuming certain bit-widths for different integer types, you should really be explicit and use the types defined in stdint.h. I just recently got bit by this while porting (someone else's) code from *nix to Windows, as the Visual C++ compiler uses a different convention for integer sizes (LLP64) than that on any other x64 or power-7 compiler I've used (LP64). In short, if you want 32 bits, you're better off saying it explicitly with a type like uint32_t.

So this will always hold when such conversion happens in C? defined by C standard? – Jun

Yes, it should always hold. Relevant quotes (with links) from the C99 standard: "The integer promotions preserve value including sign." When handling usual arithmetic type conversions: "... the integer promotions are performed on both operands. Then the following rules are applied to the promoted operands..."

As stated in the question, assume 16-bit short and 32-bit int.

unsigned short a = 0xFFFF;

This initializes a to 0xFFFF, or 65535. The expression 0xFFFF is of type int; it's implicitly converted to unsigned short, and the value is preserved.

signed short b = 0xFFFF;

This is a little more complicated. Again, 0xFFFF is of type int. It's implicitly converted to signed short -- but since the value is outside the range of signed short the conversion cannot preserve the value.

Conversion of an integer to a signed integer type, when the value can't be represented, yields an implementation-defined value. In principle, the value of b could be anything between -32768 and +32767 inclusive. In practice, it will almost certainly be -1. I'll assume for the rest of this that the value is -1.

unsigned int u16tou32 = a;

The value of a is 0xFFFF, which is converted from unsigned short to unsigned int. The conversion preserves the value.

unsigned int s16tou32 = b;

The value of b is -1. It's converted to unsigned int, which clearly cannot store a value of -1. Conversion of an integer to an unsigned integer type (unlike conversion to a signed type) is defined by the language; the result is reduced modulo MAX + 1, where MAX is the maximum value of the unsigned type. In this case, the value stored in s16tou32 is UINT_MAX - 1, or 0xFFFFFFFF.

signed int u16tos32 = a;

The value of a, 0xFFFF, is converted to signed int. The value is preserved.

signed int s16tos32 = b;

The value of b, -1, is converted to signed int. The value is preserved.

So the stored values are:

a == 0xFFFF (65535)
b == -1     (not guaranteed, but very likely)
u16tou32 == 0xFFFF (65535)
s16tou32 == 0xFFFFFFFF (4294967295)
u16tos32 == 0xFFFF (65535)
s16tos32 == -1

To summarize the integer conversion rules:

If the target type can represent the value, the value is preserved.

Otherwise, if the target type is unsigned, the value is reduced modulo MAX+1, which is equivalent to discarding all but the low-order N bits. Another way to describe this is that the value MAX+1 is repeatedly added to or subtracted from the value until you get a result that's in the range (this is actually how the C standard describes it). Compilers don't actually generate code to do this repeated addition or subtraction; they just have to get the right result.

Otherwise, the target type is signed and cannot represent the value; the conversion yields an implementation-defined value. In almost all implementations, the result discards all but the low-order N bits using a two's-complement representation. (C99 added a rule for this case, permitting an implementation-defined signal to be raised instead. I don't know of any compiler that does this.)

What is the rule for C to cast between short and int?

Tags:

C

Related

Recent Posts