The single-file sort.c works. It also mixes three unrelated
concerns in one place: parsing command-line arguments, opening
files, and managing a dynamic array of strings. As soon as the next
program needs the same dynamic-array behaviour, that code has to
move out of sort.c and into a place a second file can include.
That is the point at which a C project stops being one file.
What goes where
The dynamic-array logic — the char ** lines plus its grow / append
/ sort / print / free routines — is reusable. It does not care that
it is being used by a sort program. It belongs in its own
translation unit:
lines.hdeclares the API: alines_tstruct and five functions for working on it.lines.cimplements those functions.sort.cbecomes the program-specific glue: argv handling, file reading, calling into the lines API.- A
Makefileties them together.
The header
Create lines.h:
#ifndef LINES_H
#define LINES_H
#include <stddef.h>
typedef struct {
char **data;
size_t count;
size_t capacity;
} lines_t;
void lines_init(lines_t *lines);
int lines_append(lines_t *lines, const char *line);
void lines_sort(lines_t *lines);
void lines_print(const lines_t *lines);
void lines_free(lines_t *lines);
#endifThe #ifndef LINES_H ... #endif block is an include guard. It
keeps the header from being processed twice if it is included from
two different files. Without one, you get duplicate-definition
errors at compile time. Every header you write should have one;
the convention is <FILENAME>_H.
<stddef.h> is where size_t is declared. We include it here
because the struct uses it.
A struct is a C type that bundles several named values into a
single thing. The shape typedef struct { ... } name; is the
standard idiom for declaring one — it defines the struct and gives
it the type name name in one step, so the rest of the code can
write lines_t lines; rather than the longer struct lines lines;.
The lines_t struct collects the three things that used to be
loose variables in main: the array, the count, and the capacity.
Bundling them into a struct makes "the lines collection" a single
thing the rest of the code can pass around. Every function in the
API takes a lines_t * — a pointer to one — and reads or modifies
its fields. To reach a field through a pointer, C uses a
dedicated operator: ->.
So lines->count reads the count field, and lines->data = NULL
overwrites the data field. When you have a struct value directly
rather than a pointer to one, use . instead:
/* value in hand — dot */
lines_t lines;
lines.count = 0;
/* pointer — arrow */
lines_t *p = &lines;
p->count = 0; /* same as (*p).count */p->count and (*p).count are identical — -> is shorthand for
dereferencing and then reading the field. Every function in this API
takes a lines_t *, so every field access below uses ->.
The form used here is the anonymous typedef: typedef struct { } name;. When a struct must refer to its own type inside its definition
— a node pointing to the next node of the same type — the definition
needs a tag to name itself before the typedef is complete:
typedef struct s_name { struct s_name *next; } t_name;. You will
see that tagged form in the c-tier.
The five function declarations are the API. The implementations
will live in lines.c; any file that wants to use them only needs
to #include "lines.h".
The implementation
Create lines.c:
#include "lines.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void lines_init(lines_t *lines)
{
lines->data = NULL;
lines->count = 0;
lines->capacity = 0;
}
int lines_append(lines_t *lines, const char *line)
{
char **grown;
size_t new_capacity;
char *copy;
if (lines->count == lines->capacity) {
if (lines->capacity == 0)
new_capacity = 16;
else
new_capacity = lines->capacity * 2;
grown = realloc(lines->data, new_capacity * sizeof(char *));
if (!grown)
return (-1); /* 0 = success, -1 = failure; mirrors Unix syscall convention */
lines->data = grown;
lines->capacity = new_capacity;
}
copy = malloc(strlen(line) + 1);
if (!copy)
return (-1);
strcpy(copy, line);
lines->data[lines->count++] = copy;
return (0);
}
static int cmp_strings(const void *a, const void *b)
{
const char *const *sa = a;
const char *const *sb = b;
return (strcmp(*sa, *sb));
}
void lines_sort(lines_t *lines)
{
qsort(lines->data, lines->count, sizeof(char *), cmp_strings);
}
void lines_print(const lines_t *lines)
{
size_t i;
for (i = 0; i < lines->count; i++)
printf("%s\n", lines->data[i]);
}
void lines_free(lines_t *lines)
{
size_t i;
for (i = 0; i < lines->count; i++)
free(lines->data[i]);
free(lines->data);
lines->data = NULL;
lines->count = 0;
lines->capacity = 0;
}A few things worth noticing.
#include "lines.h" is the same #include you have used for
standard headers, but with quotes instead of angle brackets.
Quotes mean "look in the current directory first"; angle brackets
mean "look in the system include paths." Use quotes for headers
you wrote yourself.
lines_append now checks realloc and malloc for failure — they
return NULL if the system runs out of memory — and propagates a
-1 back to the caller. The single-file version did not bother;
the moment code lives in a library, callers expect a sensible
return value.
The convention — 0 for success, -1 for failure —
mirrors how Unix system calls work: read, write, and most POSIX
functions return -1 on error. Using the same convention means any
caller can check != 0 and be correct regardless of which specific
negative value comes back.
cmp_strings is static because it is only used inside lines.c
and there is no reason for the rest of the program to see it.
static at file scope means "private to this translation unit."
lines_free walks the array freeing each string before freeing
the array itself — the same fix we made when valgrind caught the
leak, now living inside the function that owns the lifecycle.
Sort, the glue
sort.c shrinks to just the program-specific parts:
#include "lines.h"
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv)
{
FILE *in;
lines_t lines;
char buf[4096];
if (argc < 2) {
fprintf(stderr, "usage: %s <file>\n", argv[0]);
return (1);
}
in = fopen(argv[1], "r");
if (!in) {
fprintf(stderr, "cannot open %s\n", argv[1]);
return (1);
}
lines_init(&lines);
while (fgets(buf, sizeof(buf), in)) {
buf[strcspn(buf, "\n")] = '\0';
if (lines_append(&lines, buf) != 0) {
fprintf(stderr, "out of memory\n");
fclose(in);
lines_free(&lines);
return (1);
}
}
fclose(in);
lines_sort(&lines);
lines_print(&lines);
lines_free(&lines);
return (0);
}Half as much code as before, and every line is about this
program. The dynamic-array machinery is gone; it lives in
lines.c now.
Compiling by hand
Before the Makefile, build the project the slow way once. There are two compile steps and one link step:
gcc -Wall -Wextra -g -std=c99 -c sort.c -o sort.o
gcc -Wall -Wextra -g -std=c99 -c lines.c -o lines.o
gcc -Wall -Wextra -g -std=c99 -o sort sort.o lines.o-c tells gcc to compile a .c file into a .o object file
without trying to link a complete program. The third command takes
both object files and links them into the executable.
This is what every C build does under the hood: compile each .c
to a .o, then link the .os. Splitting the work means changing
one source file only requires recompiling that file, not the whole
project.
./sort test.txtSame sorted output. The program still works.
The Makefile
Typing those three commands every time you change a file gets old
fast. make exists to automate it. A Makefile declares your
project's rules and make figures out the rest.
Create Makefile:
NAME = sort
CC = gcc
CFLAGS = -Wall -Wextra -g -std=c99
SRCS = sort.c lines.c
OBJS = $(SRCS:.c=.o)
all: $(NAME)
$(NAME): $(OBJS)
$(CC) $(CFLAGS) -o $(NAME) $(OBJS)
%.o: %.c lines.h
$(CC) $(CFLAGS) -c $< -o $@
clean:
rm -f $(OBJS)
fclean: clean
rm -f $(NAME)
re: fclean all
.PHONY: all clean fclean reThere is a lot in this file. Take it section by section.
Variables. The first five lines define variables — NAME,
CC, CFLAGS, SRCS, OBJS. Anywhere else in the file, $(NAME)
expands to sort, $(CC) to gcc, and so on. $(SRCS:.c=.o) is
a substitution reference — it takes SRCS and replaces every
.c suffix with .o, giving us sort.o lines.o automatically.
Rules. Each rule has the shape:
target: prerequisites
recipeThe recipe must be indented with a single tab (not spaces —
this is one of the few places in any Unix tool where it has to be
a tab). When you run make target, make checks if any of the
prerequisites are newer than target; if they are, it runs the
recipe.
all: $(NAME) means "the all target depends on the sort
target." When you run make with no arguments, make builds the
first target it sees — all — which causes it to build $(NAME)
in turn.
The link step:
$(NAME): $(OBJS)
$(CC) $(CFLAGS) -o $(NAME) $(OBJS)sort depends on sort.o and lines.o. To produce sort, run
gcc -Wall -Wextra -g -std=c99 -o sort sort.o lines.o. The recipe is
exactly the third command we ran by hand.
The pattern rule:
%.o: %.c lines.h
$(CC) $(CFLAGS) -c $< -o $@The % is a wildcard. This rule says: for any file named
something like foo.o, build it from foo.c (and also depend on
lines.h so a header change rebuilds the world). $< is "the
first prerequisite" (the .c file). $@ is "the target" (the
.o file). This one rule replaces the two gcc -c invocations
we did by hand.
Phony targets. clean, fclean, and re do not build
files — they delete things, or rebuild from scratch. The
.PHONY: all clean fclean re line tells make that these targets
are not real files. Without it, if a file called clean ever
appeared, make clean would think there was nothing to do.
Using it
make builds the project:
makegcc -Wall -Wextra -g -std=c99 -c sort.c -o sort.o
gcc -Wall -Wextra -g -std=c99 -c lines.c -o lines.o
gcc -Wall -Wextra -g -std=c99 -o sort sort.o lines.omake clean deletes the object files:
make cleanmake fclean deletes everything make produced:
make fcleanmake re does a clean rebuild from scratch:
make reAnd running make after editing one source file only recompiles
what is necessary:
touch sort.c
makegcc -Wall -Wextra -g -std=c99 -c sort.c -o sort.o
gcc -Wall -Wextra -g -std=c99 -o sort sort.o lines.olines.c did not change, so lines.o is reused. That is the whole
point of make.
Verifying with the tester
If you have not already, clone the companion repo, copy test.sh
into your working directory, and run it:
git clone https://github.com/thecodingidiot-com/f05-building-c.git
cp f05-building-c/test.sh .
bash test.shSix checks should pass: make builds cleanly, ./sort produces
correctly sorted output, ./sort returns exit code 1 on a missing
file and with no arguments, the binary is clean under valgrind,
and a sanitiser rebuild also runs clean.
The single file you started with is now a project.