thecodingidiot.com

Building CThe Overflow

The Overflow

Two flags on gcc change everything.

gcc -Wall -Wextra -g -std=c99 -fsanitize=address -fsanitize=undefined \
    sort.c -o sort

-fsanitize=address (AddressSanitizer, "asan") instruments every load and store the binary makes. Each allocation is wrapped in poisoned guard bytes; every memory access is checked against a shadow map of what is currently valid. The cost is a binary that runs roughly 2× slower and uses about 3× the memory. The benefit is that any out-of-bounds access, use-after-free, or double-free makes the program abort immediately, with a labelled report.

-fsanitize=undefined (UndefinedBehaviorSanitizer, "ubsan") catches a different family of bugs — the ones C says are undefined behaviour even though the program may seem to "work." Signed integer overflow, shifting by more than the width of the type, dereferencing a null pointer, calling a function through a wrong-typed pointer, and so on.

Together they replace what valgrind does for memory safety, and add the undefined-behaviour layer on top. They run as part of the binary itself rather than as an outside observer, which makes them faster and lets them give richer error messages.

Running the sanitised binary

Recompile with both flags, then run on the same input as before:

gcc -Wall -Wextra -g -std=c99 -fsanitize=address -fsanitize=undefined \
    sort.c -o sort
./sort test.txt

The program does not get to print the sorted output. It aborts:

=================================================================
==54321==ERROR: AddressSanitizer: heap-buffer-overflow on address
0x602000000036 at pc 0x4011d4 bp 0x7ffd5e2a3010 sp 0x7ffd5e2a27c0
WRITE of size 7 at 0x602000000036 thread T0
    #0 0x4011d3 in strcpy (.../sort+0x4011d3)
    #1 0x401445 in main /home/you/f05-practice/sort.c:46
    #2 0x7f4a2ee29d8f in __libc_start_main (.../libc.so.6+0x29d8f)
    #3 0x401094 in _start (.../sort+0x401094)
 
0x602000000036 is located 0 bytes after 6-byte region [0x602000000030,0x602000000036)
allocated by thread T0 here:
    #0 0x432e8d in malloc (.../sort+0x432e8d)
    #1 0x401407 in main /home/you/f05-practice/sort.c:45
 
SUMMARY: AddressSanitizer: heap-buffer-overflow strcpy

That is more information than valgrind gave us, in a more direct form.

Reading the report

The header tells you the kind of bug — heap-buffer-overflow — and the directionWRITE of size 7. We tried to write seven bytes into something.

The first stack trace is where the bad write happened:

#0 strcpy
#1 main /home/you/f05-practice/sort.c:46

strcpy was called from sort.c line 46, which is:

strcpy(copy, buf);

The second stack trace is where the memory came from:

0x602000000036 is located 0 bytes after 6-byte region
allocated by thread T0 here:
#0 malloc
#1 main /home/you/f05-practice/sort.c:45

Six bytes were allocated, on sort.c:45. Seven bytes were written to it on the next line. We wrote one byte past the end of an allocation.

The bug

Look at lines 45 and 46:

copy = malloc(strlen(buf));
strcpy(copy, buf);

buf here holds banana — the first line of test.txt, six characters long. strlen(buf) returns 6. We allocate six bytes. Then strcpy copies banana plus the trailing \0 that terminates every C string — seven bytes — into a six-byte block. The seventh byte goes one past the end. asan caught the very first overflow and aborted before the program could read the next line.

A C string is its visible characters plus the null terminator. strlen returns the visible length and not the storage length. Whenever you allocate space to copy a string, you allocate strlen(s) + 1 bytes — never strlen(s).

The fix is one character:

copy = malloc(strlen(buf) + 1);
strcpy(copy, buf);

Recompile and rerun. The program prints sorted output and exits clean:

gcc -Wall -Wextra -g -std=c99 -fsanitize=address -fsanitize=undefined \
    sort.c -o sort
./sort test.txt
apple
banana
cherry

No abort, no error report. Run it once more under valgrind for good measure:

gcc -Wall -Wextra -g sort.c -o sort
valgrind --leak-check=full ./sort test.txt

valgrind now prints the line every C programmer wants to see at the end of a session:

==12345== All heap blocks were freed -- no leaks are possible
==12345==
==12345== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Three bugs found, three bugs fixed. The program is correct.

A word on ubsan

We have not actually used the -fsanitize=undefined flag yet — sort has no signed integer arithmetic, no bit shifts, no pointer-to-pointer casts. UBSan never spoke up. If you want to see what it catches, write a one-liner like:

int main(void) { int x = 2147483647; x = x + 1; return x; }

Compile with -fsanitize=undefined and run. UBSan will print the file and line and tell you you just executed signed integer overflow. The same flag catches divides by zero, shifts past a type's width, dereferencing null pointers, and a long list of other things C calls "undefined behaviour" — meaning the compiler is free to do whatever it likes when they happen.

You will hit ubsan's territory the moment you start manipulating integers at the bit level (c-tier), or doing pointer arithmetic (c-tier and r-tier). Leave both sanitisers on by default during development. They are one of the best gifts modern toolchains give C programmers.

Where you are

The single-file sort.c works. It compiles with -Wall -Wextra -g, runs with the right output, is leak-free, and is overflow-free under both checkers. It is also still a single .c file that does too many things.

The next page splits it into a multi-file project with a Makefile. That is what building C means in any project bigger than a single source file.