Two flags on gcc change everything.
gcc -Wall -Wextra -g -std=c99 -fsanitize=address -fsanitize=undefined \
sort.c -o sort-fsanitize=address (AddressSanitizer, "asan") instruments every
load and store the binary makes. Each allocation is wrapped in
poisoned guard bytes; every memory access is checked against a
shadow map of what is currently valid. The cost is a binary that
runs roughly 2× slower and uses about 3× the memory. The benefit
is that any out-of-bounds access, use-after-free, or double-free
makes the program abort immediately, with a labelled report.
-fsanitize=undefined (UndefinedBehaviorSanitizer, "ubsan") catches
a different family of bugs — the ones C says are undefined behaviour
even though the program may seem to "work." Signed integer overflow,
shifting by more than the width of the type, dereferencing a null
pointer, calling a function through a wrong-typed pointer, and so
on.
Together they replace what valgrind does for memory safety, and
add the undefined-behaviour layer on top. They run as part of the
binary itself rather than as an outside observer, which makes them
faster and lets them give richer error messages.
Running the sanitised binary
Recompile with both flags, then run on the same input as before:
gcc -Wall -Wextra -g -std=c99 -fsanitize=address -fsanitize=undefined \
sort.c -o sort
./sort test.txtThe program does not get to print the sorted output. It aborts:
=================================================================
==54321==ERROR: AddressSanitizer: heap-buffer-overflow on address
0x602000000036 at pc 0x4011d4 bp 0x7ffd5e2a3010 sp 0x7ffd5e2a27c0
WRITE of size 7 at 0x602000000036 thread T0
#0 0x4011d3 in strcpy (.../sort+0x4011d3)
#1 0x401445 in main /home/you/f05-practice/sort.c:46
#2 0x7f4a2ee29d8f in __libc_start_main (.../libc.so.6+0x29d8f)
#3 0x401094 in _start (.../sort+0x401094)
0x602000000036 is located 0 bytes after 6-byte region [0x602000000030,0x602000000036)
allocated by thread T0 here:
#0 0x432e8d in malloc (.../sort+0x432e8d)
#1 0x401407 in main /home/you/f05-practice/sort.c:45
SUMMARY: AddressSanitizer: heap-buffer-overflow strcpyThat is more information than valgrind gave us, in a more direct
form.
Reading the report
The header tells you the kind of bug — heap-buffer-overflow — and
the direction — WRITE of size 7. We tried to write seven bytes
into something.
The first stack trace is where the bad write happened:
#0 strcpy
#1 main /home/you/f05-practice/sort.c:46strcpy was called from sort.c line 46, which is:
strcpy(copy, buf);The second stack trace is where the memory came from:
0x602000000036 is located 0 bytes after 6-byte region
allocated by thread T0 here:
#0 malloc
#1 main /home/you/f05-practice/sort.c:45Six bytes were allocated, on sort.c:45. Seven bytes were written
to it on the next line. We wrote one byte past the end of an
allocation.
The bug
Look at lines 45 and 46:
copy = malloc(strlen(buf));
strcpy(copy, buf);buf here holds banana — the first line of test.txt, six
characters long. strlen(buf) returns 6. We allocate six bytes.
Then strcpy copies banana plus the trailing \0 that
terminates every C string — seven bytes — into a six-byte block.
The seventh byte goes one past the end. asan caught the very first
overflow and aborted before the program could read the next line.
A C string is its visible characters plus the null terminator.
strlen returns the visible length and not the storage length.
Whenever you allocate space to copy a string, you allocate
strlen(s) + 1 bytes — never strlen(s).
The fix is one character:
copy = malloc(strlen(buf) + 1);
strcpy(copy, buf);Recompile and rerun. The program prints sorted output and exits clean:
gcc -Wall -Wextra -g -std=c99 -fsanitize=address -fsanitize=undefined \
sort.c -o sort
./sort test.txtapple
banana
cherryNo abort, no error report. Run it once more under valgrind for
good measure:
gcc -Wall -Wextra -g sort.c -o sort
valgrind --leak-check=full ./sort test.txtvalgrind now prints the line every C programmer wants to see at
the end of a session:
==12345== All heap blocks were freed -- no leaks are possible
==12345==
==12345== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)Three bugs found, three bugs fixed. The program is correct.
A word on ubsan
We have not actually used the -fsanitize=undefined flag yet — sort
has no signed integer arithmetic, no bit shifts, no pointer-to-pointer
casts. UBSan never spoke up. If you want to see what it catches,
write a one-liner like:
int main(void) { int x = 2147483647; x = x + 1; return x; }Compile with -fsanitize=undefined and run. UBSan will print the
file and line and tell you you just executed signed integer overflow.
The same flag catches divides by zero, shifts past a type's width,
dereferencing null pointers, and a long list of other things C
calls "undefined behaviour" — meaning the compiler is free to do
whatever it likes when they happen.
You will hit ubsan's territory the moment you start manipulating integers at the bit level (c-tier), or doing pointer arithmetic (c-tier and r-tier). Leave both sanitisers on by default during development. They are one of the best gifts modern toolchains give C programmers.
Where you are
The single-file sort.c works. It compiles with -Wall -Wextra -g, runs with the right output, is leak-free, and is overflow-free
under both checkers. It is also still a single .c file that does
too many things.
The next page splits it into a multi-file project with a Makefile. That is what building C means in any project bigger than a single source file.