thecodingidiot.com

Building CA Real Project

A Real Project

The single-file sort.c works. It also mixes three unrelated concerns in one place: parsing command-line arguments, opening files, and managing a dynamic array of strings. As soon as the next program needs the same dynamic-array behaviour, that code has to move out of sort.c and into a place a second file can include.

That is the point at which a C project stops being one file.

What goes where

The dynamic-array logic — the char ** lines plus its grow / append / sort / print / free routines — is reusable. It does not care that it is being used by a sort program. It belongs in its own translation unit:

  • lines.h declares the API: a lines_t struct and five functions for working on it.
  • lines.c implements those functions.
  • sort.c becomes the program-specific glue: argv handling, file reading, calling into the lines API.
  • A Makefile ties them together.

The header

Create lines.h:

#ifndef LINES_H
#define LINES_H
 
#include <stddef.h>
 
typedef struct {
    char    **data;
    size_t  count;
    size_t  capacity;
}   lines_t;
 
void    lines_init(lines_t *lines);
int     lines_append(lines_t *lines, const char *line);
void    lines_sort(lines_t *lines);
void    lines_print(const lines_t *lines);
void    lines_free(lines_t *lines);
 
#endif

The #ifndef LINES_H ... #endif block is an include guard. It keeps the header from being processed twice if it is included from two different files. Without one, you get duplicate-definition errors at compile time. Every header you write should have one; the convention is <FILENAME>_H.

<stddef.h> is where size_t is declared. We include it here because the struct uses it.

A struct is a C type that bundles several named values into a single thing. The shape typedef struct { ... } name; is the standard idiom for declaring one — it defines the struct and gives it the type name name in one step, so the rest of the code can write lines_t lines; rather than the longer struct lines lines;.

The lines_t struct collects the three things that used to be loose variables in main: the array, the count, and the capacity. Bundling them into a struct makes "the lines collection" a single thing the rest of the code can pass around. Every function in the API takes a lines_t * — a pointer to one — and reads or modifies its fields. To reach a field through a pointer, C uses a dedicated operator: ->.

So lines->count reads the count field, and lines->data = NULL overwrites the data field. When you have a struct value directly rather than a pointer to one, use . instead:

/* value in hand — dot */
lines_t  lines;
lines.count = 0;
 
/* pointer — arrow */
lines_t  *p = &lines;
p->count = 0;   /* same as (*p).count */

p->count and (*p).count are identical — -> is shorthand for dereferencing and then reading the field. Every function in this API takes a lines_t *, so every field access below uses ->.

The form used here is the anonymous typedef: typedef struct { } name;. When a struct must refer to its own type inside its definition — a node pointing to the next node of the same type — the definition needs a tag to name itself before the typedef is complete: typedef struct s_name { struct s_name *next; } t_name;. You will see that tagged form in the c-tier.

The five function declarations are the API. The implementations will live in lines.c; any file that wants to use them only needs to #include "lines.h".

The implementation

Create lines.c:

#include "lines.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
 
void lines_init(lines_t *lines)
{
    lines->data = NULL;
    lines->count = 0;
    lines->capacity = 0;
}
 
int lines_append(lines_t *lines, const char *line)
{
    char    **grown;
    size_t  new_capacity;
    char    *copy;
 
    if (lines->count == lines->capacity) {
        if (lines->capacity == 0)
            new_capacity = 16;
        else
            new_capacity = lines->capacity * 2;
        grown = realloc(lines->data, new_capacity * sizeof(char *));
        if (!grown)
            return (-1); /* 0 = success, -1 = failure; mirrors Unix syscall convention */
        lines->data = grown;
        lines->capacity = new_capacity;
    }
    copy = malloc(strlen(line) + 1);
    if (!copy)
        return (-1);
    strcpy(copy, line);
    lines->data[lines->count++] = copy;
    return (0);
}
 
static int cmp_strings(const void *a, const void *b)
{
    const char *const *sa = a;
    const char *const *sb = b;
    return (strcmp(*sa, *sb));
}
 
void lines_sort(lines_t *lines)
{
    qsort(lines->data, lines->count, sizeof(char *), cmp_strings);
}
 
void lines_print(const lines_t *lines)
{
    size_t i;
 
    for (i = 0; i < lines->count; i++)
        printf("%s\n", lines->data[i]);
}
 
void lines_free(lines_t *lines)
{
    size_t i;
 
    for (i = 0; i < lines->count; i++)
        free(lines->data[i]);
    free(lines->data);
    lines->data = NULL;
    lines->count = 0;
    lines->capacity = 0;
}

A few things worth noticing.

#include "lines.h" is the same #include you have used for standard headers, but with quotes instead of angle brackets. Quotes mean "look in the current directory first"; angle brackets mean "look in the system include paths." Use quotes for headers you wrote yourself.

lines_append now checks realloc and malloc for failure — they return NULL if the system runs out of memory — and propagates a -1 back to the caller. The single-file version did not bother; the moment code lives in a library, callers expect a sensible return value.

The convention — 0 for success, -1 for failure — mirrors how Unix system calls work: read, write, and most POSIX functions return -1 on error. Using the same convention means any caller can check != 0 and be correct regardless of which specific negative value comes back.

cmp_strings is static because it is only used inside lines.c and there is no reason for the rest of the program to see it. static at file scope means "private to this translation unit."

lines_free walks the array freeing each string before freeing the array itself — the same fix we made when valgrind caught the leak, now living inside the function that owns the lifecycle.

Sort, the glue

sort.c shrinks to just the program-specific parts:

#include "lines.h"
#include <stdio.h>
#include <string.h>
 
int main(int argc, char **argv)
{
    FILE    *in;
    lines_t lines;
    char    buf[4096];
 
    if (argc < 2) {
        fprintf(stderr, "usage: %s <file>\n", argv[0]);
        return (1);
    }
    in = fopen(argv[1], "r");
    if (!in) {
        fprintf(stderr, "cannot open %s\n", argv[1]);
        return (1);
    }
    lines_init(&lines);
    while (fgets(buf, sizeof(buf), in)) {
        buf[strcspn(buf, "\n")] = '\0';
        if (lines_append(&lines, buf) != 0) {
            fprintf(stderr, "out of memory\n");
            fclose(in);
            lines_free(&lines);
            return (1);
        }
    }
    fclose(in);
    lines_sort(&lines);
    lines_print(&lines);
    lines_free(&lines);
    return (0);
}

Half as much code as before, and every line is about this program. The dynamic-array machinery is gone; it lives in lines.c now.

Compiling by hand

Before the Makefile, build the project the slow way once. There are two compile steps and one link step:

gcc -Wall -Wextra -g -std=c99 -c sort.c -o sort.o
gcc -Wall -Wextra -g -std=c99 -c lines.c -o lines.o
gcc -Wall -Wextra -g -std=c99 -o sort sort.o lines.o

-c tells gcc to compile a .c file into a .o object file without trying to link a complete program. The third command takes both object files and links them into the executable.

This is what every C build does under the hood: compile each .c to a .o, then link the .os. Splitting the work means changing one source file only requires recompiling that file, not the whole project.

./sort test.txt

Same sorted output. The program still works.

The Makefile

Typing those three commands every time you change a file gets old fast. make exists to automate it. A Makefile declares your project's rules and make figures out the rest.

Create Makefile:

NAME    = sort
CC      = gcc
CFLAGS  = -Wall -Wextra -g -std=c99
SRCS    = sort.c lines.c
OBJS    = $(SRCS:.c=.o)
 
all: $(NAME)
 
$(NAME): $(OBJS)
	$(CC) $(CFLAGS) -o $(NAME) $(OBJS)
 
%.o: %.c lines.h
	$(CC) $(CFLAGS) -c $< -o $@
 
clean:
	rm -f $(OBJS)
 
fclean: clean
	rm -f $(NAME)
 
re: fclean all
 
.PHONY: all clean fclean re

There is a lot in this file. Take it section by section.

Variables. The first five lines define variables — NAME, CC, CFLAGS, SRCS, OBJS. Anywhere else in the file, $(NAME) expands to sort, $(CC) to gcc, and so on. $(SRCS:.c=.o) is a substitution reference — it takes SRCS and replaces every .c suffix with .o, giving us sort.o lines.o automatically.

Rules. Each rule has the shape:

target: prerequisites
	recipe

The recipe must be indented with a single tab (not spaces — this is one of the few places in any Unix tool where it has to be a tab). When you run make target, make checks if any of the prerequisites are newer than target; if they are, it runs the recipe.

all: $(NAME) means "the all target depends on the sort target." When you run make with no arguments, make builds the first target it sees — all — which causes it to build $(NAME) in turn.

The link step:

$(NAME): $(OBJS)
	$(CC) $(CFLAGS) -o $(NAME) $(OBJS)

sort depends on sort.o and lines.o. To produce sort, run gcc -Wall -Wextra -g -std=c99 -o sort sort.o lines.o. The recipe is exactly the third command we ran by hand.

The pattern rule:

%.o: %.c lines.h
	$(CC) $(CFLAGS) -c $< -o $@

The % is a wildcard. This rule says: for any file named something like foo.o, build it from foo.c (and also depend on lines.h so a header change rebuilds the world). $< is "the first prerequisite" (the .c file). $@ is "the target" (the .o file). This one rule replaces the two gcc -c invocations we did by hand.

Phony targets. clean, fclean, and re do not build files — they delete things, or rebuild from scratch. The .PHONY: all clean fclean re line tells make that these targets are not real files. Without it, if a file called clean ever appeared, make clean would think there was nothing to do.

Using it

make builds the project:

make
gcc -Wall -Wextra -g -std=c99 -c sort.c -o sort.o
gcc -Wall -Wextra -g -std=c99 -c lines.c -o lines.o
gcc -Wall -Wextra -g -std=c99 -o sort sort.o lines.o

make clean deletes the object files:

make clean

make fclean deletes everything make produced:

make fclean

make re does a clean rebuild from scratch:

make re

And running make after editing one source file only recompiles what is necessary:

touch sort.c
make
gcc -Wall -Wextra -g -std=c99 -c sort.c -o sort.o
gcc -Wall -Wextra -g -std=c99 -o sort sort.o lines.o

lines.c did not change, so lines.o is reused. That is the whole point of make.

Verifying with the tester

If you have not already, clone the companion repo, copy test.sh into your working directory, and run it:

git clone https://github.com/thecodingidiot-com/f05-building-c.git
cp f05-building-c/test.sh .
bash test.sh

Six checks should pass: make builds cleanly, ./sort produces correctly sorted output, ./sort returns exit code 1 on a missing file and with no arguments, the binary is clean under valgrind, and a sanitiser rebuild also runs clean.

The single file you started with is now a project.