Bitfield Pitfalls | OS/2 Museum

💥 Check out this trending post from Hacker News 📖

📂 **Category**:

💡 **What You’ll Learn**:

Some time ago I ran into a bug that had been dormant for some time. The problem involved expressions where one of the operands is a bit-field.

To demonstrate the problem, I will present a reduced example:

#include 
#include 

typedef struct 🔥 BF;

int main( void )
⚡

The troublesome behavior is demonstrated by the lines performing the left shift. We take a 12-bit wide bit-field, shift it left by 20 bits so that the high bit of the bit-field lines up with the high bit of uint32_t, and then convert the result to uint64_t.

The contents of u1 will be predictable. The contents of u2 perhaps not so much. Or more specifically, the resulting value of u2 depends entirely on who you ask.

First off, the problematic behavior shows up on platforms where int is 32 bits wide… which is only almost everything these days. It notably includes both 32-bit and 64-bit x86 platforms.

According to Microsoft, the result is as follows:

u1: 000000007FF00000
u2: 0000000080000000

According to GNU C and clang, the result is something else:

u1: 000000007FF00000
u2: FFFFFFFF80000000

Needless to say, that’s a fairly major difference.

Why oh Why?

It’s apparent that Microsoft considers the result of the shift operation to be an unsigned integer, which is zero-extended to 64 bits. On the other hand, gcc and clang consider the result of the shift operation to be a signed integer, which is then sign-extended to 64 bits.

To understand what is happening, one needs to answer a seemingly trivial question: What is the type of a bit-field?

Note “seemingly”. The problem is that the C language standard (in particular talking about C99 here) provides at best unclear and at worst contradictory answers. First let’s see how bit-fields are defined:

A bit-field shall have a type that is a qualified or unqualified version of _Bool, signed int, unsigned int, or some other implementation-defined type.

C99 section 6.7.2.1 paragraph 4

That seems pretty unambiguous, doesn’t it? In our case, uint32_t maps to unsigned int, so according to 6.7.2.1 the bit field shall have the type unsigned int. Further down there’s the following:

A bit-field is interpreted as a signed or unsigned integer type consisting of the specified number of bits. […]

C99 section 6.7.2.1 paragraph 9

Note that the text says how a bit-field is interpreted, not what its type is, which may or may not be significant.

Now we have to look at the part of the C standard which defines how types are promoted in expressions.

The following may be used in an expression wherever an int or unsigned int may be used:
— An object or expression with an integer type whose integer conversion rank is less than or equal to the rank of int and unsigned int.
— A bit-field of type _Bool, int, signed int, or unsigned int.
If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.

C99 section 6.3.1.1 paragraph 2

The term “original type” does a lot of work there.

There is no ambiguity when, say, converting uint16_t to int (on a platform where int has 32 bits or more). The int type can clearly represent all values of the original type (typically uint16_t is equivalent to unsigned short), therefore the value is converted to (signed) int. But with bit-fields, it comes down to what precisely the “original type” is.

Note that the wording of the Standard was changed in C11, likely to address exactly this kind of problem:

If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int.

C11 section 6.3.1.1 paragraph 2 (excerpt)

Bit-fields are explicitly called out and according to the new wording, it is much clearer that a bit-field such as unsigned int uf1 : 31; should be promoted to (signed) int on a platform where int is 32 bits wide.

Two Schools of Thought

In older C standards, things were genuinely unclear. Some compiler writers looked at a declaration such as unsigned int uf2 : 12; and said, okay, the “original” type of the uf2 bit-field is clearly unsigned int. Therefore, when evaluating an expression, the type of the bit-field remains unsigned.

That is the thinking notably followed by Microsoft’s compilers.

Other compiler writers looked at the same declaration and said, alright, we have an integer with twelve (unsigned) bits, which means that a signed integer can represent all values of the original type. Therefore, when evaluating an expression, the type of uf2 gets promoted to (signed) int.

That line of thought is followed by gcc and clang, among others.

Known Problem

Needless to say, others have run into this issue in the past. An early mention of a closely related problem (bit-field initialization) is defect report #120 filed against C89 in 1993. The committee response states: Subclause 6.5.2.1 states “A bit-field is interpreted as an integral type consisting of the specified number of bits.” Thus the type of object1.bit and object2.bit can be informally described as unsigned int : 1.

It is apparent that at least to some committee members, the clause “a bit-field is interpreted as an integral type consisting of the specified number of bits” meant that the type of a bit-field is an integral type consisting of the specified number of bits.

The issue was further discussed by Joseph Myers in 2007 in WG14 document N1260, well after the C99 standard was published. Clearly even C11 did not fully resolve the problems and further discussion with even more bit-field pitfalls followed in 2022 in WG14 document N2958. The document notes that existing implementation do not always agree–precisely because using bit-fields can trigger several poorly specified edge cases.

Compiler Survey

To get a better sense of the situation, I tested a number of current and historic common PC compilers, trying to check how they deal with the problematic code I ran into.

Note that for older 32-bit compilers that offer no long long type or similar (such as Microsoft’s __int64 type), there is no equivalent way to trigger the problem because int is the largest integer type already. But there is a good alternative: Converting a bit-field to double. That triggers the same process of first promoting the bit-field to either int or unsigned int and then to double.

The equivalent problem does exist for 16-bit compilers. The long type is wider than int, therefore analogous situation can be triggered e.g. like this:

/* For 16-bit C compilers */
#include typedef struct 💬 BF;
int main( void )
🔥

All 16-bit compilers I tried produced the same result:

u1: 00007C00
u2: 00008000

In other words, the bit-field type always remained unsigned during promotions.

The 16-bit compilers tested included: Microsoft C 5.0, 6.0, and Visual C++ 1.52; various versions of Watcom C; Borland C++ 3.1 and 4.0; Digital Mars 8.38.

The situation was more interesting for 32-bit compilers. For compilers that do not support long long, I used a variant with the double type:

/* For 32-bit C compilers */
#include typedef struct {
unsigned    uf1 : 12;
unsigned    uf2 : 12;
unsigned    uf3 : 8;
} BF;
int main( void )
{
BF                  bf;
double              d1, d2;
unsigned            u1, u2;
bf.uf1 = 0x7ff;
bf.uf2 = ~bf.uf1;
d1 = bf.uf1 << 20;
d2 = bf.uf2 << 20;
u1 = bf.uf1 << 20;
u2 = bf.uf2 << 20;
printf( "d1 (u1): %lf (%X)\n", d1, u1 );
printf( "d2 (u1): %lf (%X)\n", d2, u2 );
return( 0 );
}

Most DOS/Windows compilers keep the existing behavior and produce the following:

d1 (u1): 2146435072.000000 (7FF00000)
d2 (u1): 2147483648.000000 (80000000)

That includes: Borland C++ 5.5.1, Microsoft C/C++ from version 9.0 (Visual C++ 2.0, 1994) to at least version 19.41 (2025); numerous versions of Watcom C; Digital Mars 8.38.

A notable exception was IBM’s compiler. VisualAge C++ 3.5 on Windows, as well as VisualAge C++ 3.0 on OS/2, produce the following:

d1 (u1): 2146435072.000000 (7FF00000)
d2 (u1): -2147483648.000000 (80000000)

Since the VisualAge C++ 3.5 compiler also supports 64-bit integers, and even has inttype.h, I was able to test the first example as well:

u1: 000000007FF00000
u2: FFFFFFFF80000000

IBM clearly considers the bit-field width and an unsigned bit-field narrower than int gets promoted to a signed type.

MingW gcc 3.4.5 produces the same result as VisualAge C++, that is, unsigned bit-fields get promoted to int if their width allows it.

Current versions of gcc and clang behave the same as the old gcc 3.4.5. This may cause trouble when writing portable code, because Microsoft and non-Microsoft compilers deliver different results.

By my reading of the current C standards, bit-field types should be considered to have the specified width in addition to the underlying type (int, unsigned int, etc.). That affects integer promotions.

Microsoft’s compilers do match Microsoft’s own documentation which explicitly states the following: Bit fields have the same semantics as the integer type. A bit field is used in expressions in exactly the same way as a variable of the same base type would be used. It doesn’t matter how many bits are in the bit field.

Microsoft may be unwilling to change the existing behavior because it would lead to the worst kind of “quiet change”–code that previously was and still is conforming suddenly produces different results.

Summary

Although bit-fields have been part of the C language since the 1970s, their precise semantics were not fully defined in the ANSI C89 standard, and the C99 revision didn’t bring much improvement either.

Although C11 and later revisions did bring improvements, as late as 2022, some unclarities still remained—although to be fair, some of those were brought about by expanding the capabilities of bit-fields, such as allowing wider integer types to be used as the basis of bit-fields.

The practical consequence is that when using bit-fields in some unusual contexts (and a few not so unusual), different compilers produce different results. Since no warnings are produced, this may lead to unpleasant surprises for programmers or end users.

{💬|⚡|🔥} **What’s your take?**
Share your thoughts in the comments below!

#️⃣ **#Bitfield #Pitfalls #OS2 #Museum**

🕒 **Posted on**: 1774123015

🌟 **Want more?** Click here for more info! 🌟