How to specify enum size in GCC?

As Matteo Italia's answer says, gcc lets you define a 64-bit enumeration type by specifying a 64-bit value for one of the members. For example:

enum some_enum {
    /* ... */
    max = 0x7fffffffffffffff
};

As for your use of 'mov', 'cmp', and so forth, there is no necessary correlation between the representation of a string literal like "mov" and the representation of a multi-character character constant like 'mov'.

The latter is legal (and supported by gcc), but the value is implementation-defined. The standard says that the type is always int, and gcc doesn't seem to have an extension that lets you override that. So if int is 4 bytes, then 'sysenter', if it's accepted at all, won't necessarily have the value you're looking for. gcc seems to ignore all but the low-order bytes of such a constant. The value of the constant seems to be consistent across big-endian and little-endian systems -- which means that it won't consistently match the representation of a similar string literal.

For example, this program:

#include <stdio.h>
int main(void) {
    const char *s1 = "abcd";
    const char *s2 = "abcdefgh";
    printf("'abcd'     = 0x%x\n", (unsigned)'abcd');
    printf("'abcdefgh' = 0x%x\n", (unsigned)'abcdefgh');
    printf("*(unsigned*)s1 = 0x%x\n", *(unsigned*)s1);
    printf("*(unsigned*)s2 = 0x%x\n", *(unsigned*)s2);
    return 0;
}

produces this output when compiled with gcc on a little-endian system (x86):

'abcd'     = 0x61626364
'abcdefgh' = 0x65666768
*(unsigned*)s1 = 0x64636261
*(unsigned*)s2 = 0x64636261

and this output on a big-endian system (SPARC):

'abcd'     = 0x61626364
'abcdefgh' = 0x65666768
*(unsigned*)s1 = 0x61626364
*(unsigned*)s2 = 0x61626364

So I'm afraid your idea of matching character constants like 'mov' against strings like "mov" isn't going to work. (Conceivably you could normalize the string representations to big-endian, but I wouldn't take that approach myself.)

The problem you're trying to solve is quickly mapping strings like "mov" to specific integer values that represent CPU instructions. You're right that a long sequence of strcmp() calls is going to be inefficient (have you actually measured it and found that the speed is unacceptable?) -- but there are better ways. A hash table of some sort is probably the best. There are tools to generate perfect hash functions, so that a relatively cheap computation on the value of the string gives you a unique integer value.

You won't be able to write the definitions of your enumeration values quite as conveniently, but once you have the right hash function you can write a program to generate the C source code for the enum type.

That's assuming that an enum is the best approach here; it might not be. If I were doing this, the central data structure would be a collection of structs, where each one contains the string name of the operator and whatever other information is associated with it. The hash function would map strings like "mov" to indices in this collection. (I'm being deliberately vague about what kind of "collection" to use; with the right hash function, it might be a simple array.) With this kind of solution, I don't think the 64-bit enum type is needed.


You could use an union type:

union some {
    enum { garbage1, garbage2 } a;
    int64_t dummy;
};

Although the C99 standard specifies that an enum cannot be based on anything but an int (§6.7.2.2 ¶2)1, it seems that gcc follows the C++ idea that, if a value in an enum is bigger than an int, it can base it on a bigger integer type. I don't have any problem with this code, neither on x86 neither on x64:

enum myEnum
{
    a=1234567891234567890LL
};
 
int main()
{
    enum myEnum e;
    printf("%u %u", sizeof(void *), sizeof(e));
    return 0;
}

on x86 I get

4 8

and on x64 (on my machine) I get

8 8

Although, asking for pedantic respect of the standard, I get, as expected:

matteo@teodeb:~/cpp$ gcc -ansi -pedantic testenum.c
testenum.c:5:7: warning: use of C99 long long integer constant
testenum.c:5: warning: ISO C restricts enumerator values to range of ‘int’

  1. Actually, it's a bit more complicated; ¶4 specifies that the implementation is free to choose as "base type" any particular type that is "compatible with char, a signed integer type or an unsigned integer type", as long as it can represent all the elements of the enum.

    On the other hand, ¶2 specifies that each member of the enum must be representable as int, so, even if the implementation is free to base your enum even on a gazillion bit integer, the constants defined for it cannot be anything that can't be represented by an int. Thus, this means that in practice the compiler won't base the enum on anything bigger than an int, but it may base it on something smaller if your values don't require the full range of int.

Thanks to @jons34yp for pointing out my initial mistake.

Tags:

C

Enums

Gcc