We need a sort utility. The spec is short: take a filename, read
its lines, sort them alphabetically, print them. Five tools later
we will have a clean version of this. Right now we just want it to
work.
Open sort.c in your editor and follow along. The first version is
a single file, top to bottom.
The skeleton
Every C program starts the same way: includes, then main. Sort
needs three standard headers:
#include <stdio.h>
#include <stdlib.h>
#include <string.h><stdio.h> for printing and reading files, <stdlib.h> for
dynamic memory and sorting, <string.h> for string handling.
Now main. We have not used argc or argv before — these are
how a program receives its command-line arguments. argv is an
array of strings: argv[0] is the program name, argv[1] is the
first thing the user typed after it, and so on. argc is how many
of those entries there are.
int main(int argc, char **argv)char ** is a pointer to a pointer to a char. For now, read it
as the C syntax for array of strings — each string is a char *,
and the array holding them is char **. The c-tier covers what
pointers really are. Right here, argv is just a list of strings.
If the user did not pass a filename, complain and exit:
if (argc < 2) {
fprintf(stderr, "usage: %s <file>\n", argv[0]);
return (1);
}fprintf(stderr, ...) writes to standard error rather than standard
output. Error messages belong on stderr so they do not get mixed
into the program's actual output, which is going to stdout.
Opening the file
fopen opens a file and returns a FILE * — a pointer the rest of
the standard I/O functions use to know which file we mean. The
second argument is the mode; "r" is read.
in = fopen(argv[1], "r");That is it for setup. Onward.
Reading the lines
We need somewhere to put the lines as we read them. Sort cannot know how many lines a file has until it has read them all, so a fixed-size array will not do. We need a dynamic array — one that grows as we add to it.
The C standard library has three calls for this:
malloc(n)allocates a freshn-byte block of memory and returns a pointer to it.realloc(p, n)resizes an existing block tonbytes.free(p)releases a block when you are done with it.
All three live in <stdlib.h>. They are the heart of dynamic
memory in C, and the c-tier covers them in detail. For sort we
just need to know what they do.
For sort, the dynamic array is an array of strings — char **,
the same syntax argv uses. We will start it empty and grow it
as we read. We also need a fixed-size buffer to read each line
into before copying it somewhere permanent.
C wants every variable declared at the top of the function, so we
collect all of main's variables — including the FILE *in we
used in the line above — into one block. size_t is an unsigned
integer type from <stddef.h> used for sizes and array indices.
FILE *in;
char **lines;
char *copy;
size_t count;
size_t capacity;
size_t new_capacity;
size_t i;
char buf[4096];
lines = NULL;
count = 0;
capacity = 0;count is how many lines we have so far. capacity is how much
room the array currently has. lines is the array itself,
initially nothing — NULL is C's "this pointer points at no
memory yet."
Before the code, the shape of the algorithm:
The read loop uses fgets, which reads one line at a time:
while (fgets(buf, sizeof(buf), in)) {
buf[strcspn(buf, "\n")] = '\0';fgets returns NULL when the file ends, so the while exits
naturally. The line it reads keeps its trailing newline, and we
do not want that in the sorted output, so we strip it: strcspn
finds the offset of the first newline in buf, and we overwrite
that newline with \0 (the C string terminator).
Inside the loop, grow the array if it is full, then copy the line into a fresh allocation:
if (count == capacity) {
if (capacity == 0)
new_capacity = 16;
else
new_capacity = capacity * 2;
lines = realloc(lines, new_capacity * sizeof(char *));
capacity = new_capacity;
}
copy = malloc(strlen(buf));
strcpy(copy, buf);
lines[count++] = copy;
}Doubling the capacity each time the array fills is a standard
trick — it keeps the amortised cost of growing the array small.
realloc(NULL, ...) is equivalent to malloc, which is why
starting lines as NULL and reusing the same line works for the
very first allocation.
malloc(strlen(buf)) reserves a fresh block the size of the line.
strcpy(copy, buf) copies the line text into it. lines[count++] = copy stores the new pointer at the end of the array, then advances
count by one. lines[i] is C's array indexing — the same syntax
you would use on a fixed-size array.
Once the loop exits, close the file:
fclose(in);Sorting
The standard library has qsort for sorting any array. It takes
four arguments: the array, the number of elements, the size of each
element, and a comparison function. The comparator takes two
const void * arguments and returns negative, zero, or positive
depending on whether the first should sort before, equal to, or
after the second.
For sorting strings, the comparator is short — call strcmp and
return what it returns:
static int cmp_strings(const void *a, const void *b)
{
const char *const *sa = a;
const char *const *sb = b;
return (strcmp(*sa, *sb));
}The array elements are char * — strings. qsort does not know
that; it just hands the comparator two generic pointers and asks
for an order. The two-line cast in cmp_strings tells the compiler
what those pointers really point at, and *sa and *sb then read
the strings out — the * operator means "the thing this pointer
points at." Once we have the two strings, strcmp does the actual
comparison.
That is more new C than the rest of the file put together: the
* operator, void *, function pointers. The c-tier comes back to
all three. Right here, the shape above is the standard pattern for
sorting strings in C and works the same way every time. Type it,
move on.
The call itself is one line:
qsort(lines, count, sizeof(char *), cmp_strings);Printing and cleaning up
Print every line to stdout:
for (i = 0; i < count; i++)
printf("%s\n", lines[i]);lines[i] is the string at position i in the array — the same
indexing as any array. We added the \n back because we stripped
the original.
And free what we allocated:
free(lines);
return (0);The whole thing
Put it all together:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static int cmp_strings(const void *a, const void *b)
{
const char *const *sa = a;
const char *const *sb = b;
return (strcmp(*sa, *sb));
}
int main(int argc, char **argv)
{
FILE *in;
char **lines;
char *copy;
size_t count;
size_t capacity;
size_t new_capacity;
size_t i;
char buf[4096];
if (argc < 2) {
fprintf(stderr, "usage: %s <file>\n", argv[0]);
return (1);
}
in = fopen(argv[1], "r");
lines = NULL;
count = 0;
capacity = 0;
while (fgets(buf, sizeof(buf), in)) {
buf[strcspn(buf, "\n")] = '\0';
if (count == capacity) {
if (capacity == 0)
new_capacity = 16;
else
new_capacity = capacity * 2;
lines = realloc(lines, new_capacity * sizeof(char *));
capacity = new_capacity;
}
copy = malloc(strlen(buf));
strcpy(copy, buf);
lines[count++] = copy;
}
fclose(in);
qsort(lines, count, sizeof(char *), cmp_strings);
for (i = 0; i < count; i++)
printf("%s\n", lines[i]);
free(lines);
return (0);
}Compile and run it on a tiny input:
printf 'banana\ncherry\napple\n' > test.txt
gcc sort.c -o sort
./sort test.txtYou should see:
apple
banana
cherryLooks fine. Ship it.