I have been busy lately, however, from now on I will try to post at least twice a month. Now I wanted to post about an interesting issue I had at work that is related to endianess and memory alignment. I knew the concept of endianess, but never faced a bug related to it. However, 2 months ago ( yeah, I should have posted this before, but you all know that I am a procrastinator ), we detected a failure in one of our test cases for the ODBC driver to connect to the iSeries. The test case was failing on PowerPC 32 bit architecture only, x86-32bit test was successfull.
So, I logged on via ssh to the test server LPAR and started debugging the problem. I bless gdb and vim, because I did not have to install anything else to debug/develop on site. Finally I found the problem in code similar to this one:
class ConvertHandleToObject { /* lots of code here */ union { void* void_ptr; TYPE1* type1_ptr; TYPE2* type2_ptr; TYPE3* type3_ptr; struct { unsigned free:1; unsigned next:31; }; }; };
“free” member was used to denote that the object of class ConvertHandleToObject was available to be used/returned as a handle, but, why share memory space with the actual pointer to the handle object? , and more important why it works in x86 but not at PowerPC?
The reason behind it is memory alignment.
To understand memory alignment we must first introduce “aligned memory access” and “unaligned memory access”. Aligned memory access is when the processor attempts to fetch a data object of size N stored at some memory address that is multiple of N. That is, if we want to access a 32 bit Integer ( 4 bytes ) in aligned fashion, the integer must be stored at memory addresses multiple of 4 (0x04, 0x08 etc), and that’s why access to “char” values are always aligned ( sizeof(char) == 1 ). Unaligned access is the opposite, fetch a dataobject of size N stored at memory address NOT a multiple of N, like fetching a 32bit integer at memory address 0x03. Alignment is important because if the data in your program is aligned, access to data will be faster. Fortunately for us, compilers take care of aligning data. Let’s see an example:
struct aligned { long long_data; char byte_data; int integer_data; };
This is a classic example of memory alignment. Someone might say that a structure like that has a memory size of sizeof(long) + sizeof(char) + sizeof(int) … that is, 9 bytes for common 32 bit architecture. However if you print sizeof(struct aligned) you will get, most likely, 12 bytes. So where are those 3 bytes of difference? Well, the compiler added 3 bytes of padding to align the “integer_data” member start address. Lets say some structure like this is stored at address 0x00, thus, long_data address is 0x00, byte_data address starts at 0x04 and then, if the compiler ignores alignment requirements, integer_data would start at 0x05, and we would have unaligned memory access when reading integer_data member. Thus, the compiler has added 3 padding bytes, so the aligned struct will be equivalent to:
struct aligned { long long_data; char byte_data; char padding[3]; int integer_data; };
If you compare the size of this struct with the size of the previous struct you will find out that both have a 12 bytes length.
Let’s go back to our buggy union. Unions take the alignment of the longest member, in our union the longest element is any of the pointers ( 4 bytes for 32 bit architecture ), thus, the union memory space will be aligned in 4 byte multiples. Those unions were initialized with free = 1; but in the code there were no places where free = 0; The programmer of this code thought that any address aligned in 4 byte multiples will have the last bit set to 0, thus, at any moment a valid pointer was assigned to any of the other union members, the less significant bit at the less significant byte will be zero. Why? well, for the last byte, multiples of 4 in binary are:
00000100 ( 4 )
00001000 ( 8 )
00001100 ( 12 )
00010000 ( 16 )
The code worked well at little endian, confusing code, but it worked. However, at the time of moving this to powerpc ( big endian ) the byte and bit that is assigned to free is no longer the less significant bit and byte, but the most significant, and the code breaks because even when a valid address has been assigned to some pointer in the union, free keeps being 1 ( non free ).
Still with this in mind, I dont quite understand 2 things.
1. The failure was present only in RedHat, not in SuSE. I don’t remember distribution and kernel versions.
2. The failure was present only when launching 2 or more threads.
Somehow allocated memory in SuSE did not hit this issue. When not using threads, RedHat allocations did not hit the issue either. Here are some examples of the memory addresses.
Good:
00010000 00000011 11001000 01100000 ( 0x1003B2D8 )
00010000 00000011 10110010 11011000 ( 0x1003B2D8 )
Bad:
11110110 01100000 00100110 10111000 ( 0xF66026B8 )
Failures started when memory addresses started with F. In little endian, as we see, the memory ended in 8, so, less significant bit was 0 ( setting “free” to zero ), but in big endian the most significant bit was 1, causing the failure.
Conclusion: C/C++ are low level languages, usually you can do funny stuff with hardware. But if it is not necessary don’t do it. The code fix was easy, just move out of the union the free member to not depend on the memory address, and just set free = 0 when the handle was in use.