The program works on test.txt. Try it on a filename that does not
exist:
./sort missing.txtSegmentation fault (core dumped)A segmentation fault — SIGSEGV — happens when a program tries to
read or write memory it is not allowed to touch. The kernel kills
the process and the shell prints the message above. No clue where in
the code it happened. That is what gdb is for.
Running under gdb
gdb is the GNU debugger. Launch it on the binary:
gdb ./sortIt opens an interactive prompt:
(gdb)This is gdb's own shell. From here you control the program: tell
it when to start, where to pause, what to print, when to continue.
The five commands we need to find a crash:
| Command | What it does |
|---|---|
run (r) | Start the program. Arguments after run go to the program. |
backtrace (bt) | Show the chain of function calls that led to where the program is now. Most useful at a crash. |
frame N (f N) | Switch the prompt's context to frame N from the backtrace, so print and other commands act on that frame's variables. |
print (p) | Print the value of a variable or expression. |
quit (q) | Exit gdb. |
gdb has many more commands — break to pause at a chosen line,
step and next to walk a running program one line at a time,
continue to resume after a breakpoint, and dozens more. The
c-tier returns to them when stepping through code that runs
without crashing matters. For a segfault, the five above are
enough.
For a crash the only command we need right now is run. Tell gdb
to start the program with the same argument we used at the shell:
(gdb) run missing.txtThe program runs until it crashes, and gdb catches the crash
mid-flight:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7e1a234 in _IO_fgets () from /lib/x86_64-linux-gnu/libc.so.6Two things to notice. The signal is SIGSEGV — the kernel
explicitly told the process it touched bad memory. And the function
gdb shows is _IO_fgets, deep inside libc. We did not crash in
our code — we crashed in a standard-library call we made.
The backtrace
backtrace (or bt for short) shows the chain of function calls
that led to the current location:
(gdb) bt
#0 0x00007ffff7e1a234 in _IO_fgets () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x000055555555525e in main (argc=2, argv=0x7fffffffe148) at sort.c:31The numbered lines are the call stack, with frame #0 being where
the crash happened and each higher number being who called who. The
frame we care about is #1 — that is our code. sort.c:31 is the
line that made the call to fgets.
Switch to that frame to look at the source:
(gdb) frame 1
#1 0x000055555555525e in main (argc=2, argv=0x7fffffffe148) at sort.c:31
31 while (fgets(buf, sizeof(buf), in)) {This is the line in our code. We called fgets(buf, sizeof(buf), in). It crashed inside the libc implementation. So either buf is
bad, sizeof(buf) is bad (it is not — it is a constant), or in is
bad. Let us print them:
(gdb) print buf
$1 = '\000' <repeats 4095 times>
(gdb) print in
$2 = (FILE *) 0x0buf is fine — fgets is allowed to write into uninitialised
memory; that is what we are asking it to do. in is 0x0 — NULL.
The bug
Look back at the code:
in = fopen(argv[1], "r");What does fopen do when the file cannot be opened? Linux has a
built-in answer to questions like that: the man command. Type
man <name> and you get the manual page for any standard tool or
library function. Pages live in numbered sections: section 1 is
shell commands (man 1 ls), section 3 is the C standard library
(man 3 fopen), and there are a handful of others. From here on
you will reach for man constantly as you write C — every
standard-library function has its own page describing what it
takes, what it returns, and what can go wrong.
man opens the page in a pager — less, the same one f01/03
introduced you to alongside vim, with the same vi-style keys: j
and k to scroll line by line, Space to page forward, / to
search for a word, and q to quit. The same keys work in any
pager you meet on Linux, including the one git log opens.
Run man 3 fopen now and scroll to the RETURN VALUE section:
Upon successful completion fopen()... return a FILE pointer.
Otherwise, NULL is returned and errno is set to indicate the
error.There is the answer. fopen returns NULL when it cannot open
the file — wrong path, no permission, the file is a directory,
anything. We never checked the return value. We handed a NULL
FILE * straight to fgets, which tried to read from it as if it
were a real file, and the program crashed.
This is a beginner mistake every C programmer makes. The fix is two lines:
in = fopen(argv[1], "r");
if (!in) {
fprintf(stderr, "cannot open %s\n", argv[1]);
return (1);
}!in is shorthand for "the pointer is NULL." If it is, print a
sensible error and exit with status 1.
Quit gdb with quit (or just hit Ctrl+D) and recompile:
gcc -Wall -Wextra -g sort.c -o sort
./sort missing.txtNow the program prints cannot open missing.txt and exits cleanly.
Run it on test.txt again and the sorted output still works.
One bug down, two to go.
What you actually learned
gdb is a giant tool with hundreds of commands. The handful you
just used is enough to handle most crashes:
gdb ./binaryto enter.run <args>to start.- When it crashes,
btto see where. frame Nto switch to your code's frame.print <var>to inspect what went wrong.quitwhen done.
You will use this same loop every time something segfaults for the
rest of your career. The c-tier covers gdb again with breakpoints
and stepping when it matters. For now, the program crashed, gdb
showed you the line, you fixed it is the entire skill.