C pointer to array declaration with bitwise and operator
_ctype_ is a pointer to a global array of 257 bytes. I don't know what
_ctype_ is used for.
_ctype__ represent the character categories of characters 0, …, 255 respectively:
_ctype_[c + 1] represents the category of the character
c. This is the same thing as saying that
_ctype_ + 1 points to an array of 256 characters where
(_ctype_ + 1)[c] represents the categorty of the character
(_ctype_ + 1)[(unsigned char)_c] is not a declaration. It's an expression using the array subscript operator. It's accessing position
(unsigned char)_c of the array that starts at
(_ctype_ + 1).
The code casts
unsigned char is not strictly necessary: ctype functions take char values cast to
unsigned char (
char is signed on OpenBSD): a correct call is
char c; … iscntrl((unsigned char)c). They have the advantage of guaranteeing that there is no buffer overflow: if the application calls
iscntrl with a value that is outside the range of
unsigned char and isn't -1, this function returns a value which may not be meaningful but at least won't cause a crash or a leak of private data that happened to be at the address outside of the array bounds. The value is even correct if the function is called as
char c; … iscntrl(c) as long as
c isn't -1.
The reason for the special case with -1 is that it's
EOF. Many standard C functions that operate on a
char, for example
getchar, represent the character as an
int value which is the char value wrapped to a positive range, and use the special value
EOF == -1 to indicate that no character could be read. For functions like
EOF indicates the end of the file, hence the name end-of-file. Eric Postpischil suggests that the code was originally just
return _ctype_[_c + 1], and that's probably right:
_ctype_ would be the value for EOF. This simpler implementation yields to a buffer overflow if the function is misused, whereas the current implementation avoids this as discussed above.
v is the value found in the array,
v & _C tests if the bit at
0x20 is set in
v. The values in the array are masks of the categories that the character is in:
_C is set for control characters,
_U is set for uppercase letters, etc.
_ctype_ appears to be a restricted internal version of the symbol table and I'm guessing the
+ 1 is that they didn't bother saving index
0 of it since that one isn't printable. Or possibly they are using a 1-indexed table instead of 0-indexed as is custom in C.
The C standard dictates this for all ctype.h functions:
In all cases the argument is an
int, the value of which shall be representable as an
unsigned charor shall equal the value of the macro
Going through the code step by step:
int iscntrl(int _c)The
inttypes are really characters, but all ctype.h functions are required to handle
EOF, so they must be
- The check against
-1is a check against
EOF, since it has the value
_ctype+1is pointer arithmetic to get an address of an array item.
[(unsigned char)_c]is simply an array access of that array, where the cast is there to enforce the standard requirement of the parameter being representable as
unsigned char. Note that
charcan actually hold a negative value, so this is defensive programming. The result of the
array access is a single character from their internal symbol table.
&masking is there to get a certain group of characters from the symbol table. Apparently all characters with bit 5 set (mask 0x20) are control characters. There's no making sense of this without viewing the table.
- Anything with bit 5 set will return the value masked with 0x20, which is a non-zero value. This sates the requirement of the function returning non-zero in case of boolean true.
I'll start with step 3:
increment the adress the undefined pointer points to by 1
The pointer is not undefined. It's just defined in some other compilation unit. That is what the
extern part tells the compiler. So when all files are linked together, the linker will resolve the references to it.
So what does it point to?
It points to an array with information about each character. Each character has its own entry. An entry is a bitmap representation of characteristics for the character. For example: If bit 5 is set, it means that the character is a control character. Another example: If bit 0 is set, it means that the character is a upper character.
So something like
(_ctype_ + 1)['x'] will get the characteristics that apply to
'x'. Then a bitwise and is performed to check if bit 5 is set, i.e. check whether it is a control character.
The reason for adding 1 is probably that the real index 0 is reserved for some special purpose.