thecodingidiot.com

The PipelineProcesses

Processes

Before a pipe can connect two programs, both programs must exist as running processes. This page covers the three system calls that make that happen: fork, execve, and waitpid.

What is a process

Every running program is a process. The kernel assigns it a process ID (PID), an independent address space, and a set of file descriptors. When you run ./pipeline, the shell forks a child process — a copy of itself — and that child calls execve to replace its memory image with pipeline.

Processes form a tree. Every process has a parent. When a parent exits before its child, the child is reparented to PID 1 (the init process). When a child exits before its parent, the kernel keeps the child's exit status in a table until the parent reads it — that is a zombie process. waitpid is how the parent reads and clears it.

Inspecting processes

The kernel exposes every running process through /proc — a virtual filesystem mounted at boot. Nothing in /proc lives on disk; the kernel generates each entry on demand when you read it. Every process gets a directory at /proc/<PID>/.

ls /proc/$$

$$ expands to the current shell's PID. The directory contains files the kernel generates on demand:

EntryContents
statusName, state, PID, PPID, memory usage
fd/One symlink per open file descriptor
mapsMemory regions — address ranges, permissions, backing files
cat /proc/$$/status
ls -l /proc/$$/fd

ps -p $$ shows the same process in a human-readable table — the same ps from f01/08, reading /proc directly. top and htop do the same across all processes. Both are interfaces over the same virtual filesystem.

The fd/ directory becomes relevant starting on the next page. After fork and dup2, you can inspect /proc/<PID>/fd/ on a running child to see exactly which file descriptors are open and what they point to. c05/08 uses strace -f to make the same transitions visible as system calls in real time.

fork

Unix has no primitive to create a process from nothing. The only way to start a new program is to split an existing process in two, then have the child replace itself. Every command you type in a shell — ls, grep, cat — starts that way: the shell forks, and the child calls execve to become the new program. That is the pattern this chapter builds. For the pipeline, it happens once per command in the chain.

fork creates an exact copy of the calling process. Both the parent and the child return from fork — in the parent, fork returns the child's PID; in the child, it returns zero. On failure it returns -1 and no child is created.

#include <unistd.h>
 
pid_t pid;
 
pid = fork();
if (pid < 0) {
    perror("fork");
    exit(1);
}
if (pid == 0) {
    /* child: pid == 0 */
} else {
    /* parent: pid == child's PID */
}

The child inherits a copy of the parent's memory, file descriptors, and signal handlers. It is a copy — changes in the child do not affect the parent. The file descriptor table, however, is shared at the kernel level: if both parent and child hold an open file descriptor for the same pipe end, both must close it before the pipe signals EOF.

execve

execve replaces the current process's memory image with a new program. If it succeeds, it never returns — the code after the call is gone. If it fails, it returns -1 and the process continues.

#include <unistd.h>
 
char *argv[] = { "ls", "-l", NULL };
char *envp[] = { NULL };
 
execve("/bin/ls", argv, envp);
perror("execve");   /* only reached on failure */
exit(127);

Three arguments:

  1. path — the absolute or relative path to the executable
  2. argv — a NULL-terminated array of argument strings; argv[0] is conventionally the program name
  3. envp — a NULL-terminated array of KEY=VALUE strings; pass NULL or the process's own environ to preserve the environment

Exit code 127 after a failed execve is the shell convention for "command not found". pipeline uses it the same way.

execve requires an absolute path. The shell knows ls means /bin/ls because it searches $PATH — the variable introduced in f01/09.

PATH is a colon-separated list of directories:

/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

To find a command, split PATH on : and check each directory. Joining a directory and a command name into a path is a two-length allocation — tci_strlen(dir) + 1 + tci_strlen(cmd) + 1 — followed by two tci_strcpy calls:

#include <stdlib.h>
#include <unistd.h>
#include "libtciutil.h"
 
static char *join_path(const char *dir, const char *cmd)
{
    char    *full;
    size_t  dlen;
    size_t  clen;
 
    dlen = tci_strlen(dir);
    clen = tci_strlen(cmd);
    full = malloc(dlen + 1 + clen + 1);  /* +1 for '/', +1 for '\0' */
    if (!full)
        return (NULL);
    tci_strcpy(full, dir);
    full[dlen] = '/';
    tci_strcpy(full + dlen + 1, cmd);
    return (full);
}
 
char    *find_cmd(char *cmd)
{
    char    **dirs;
    char    *path;
    char    *full;
    int     i;
 
    path = getenv("PATH");
    if (!path)
        return (NULL);
    dirs = tciu_split(path, ':');
    if (!dirs)
        return (NULL);
    i = 0;
    while (dirs[i]) {
        full = join_path(dirs[i], cmd);
        if (full && access(full, X_OK) == 0) {
            /* free dirs array before returning */
            return (full);
        }
        free(full);
        i++;
    }
    /* free dirs array */
    return (NULL);
}

access(path, X_OK) returns 0 if the file exists and is executable — no need to attempt the exec to check. If no directory in PATH contains the command, find_cmd returns NULL and the caller exits 127.

A minimal exec wrapper

Put the PATH search and exec together in exec.c:

#include <unistd.h>
#include <stdlib.h>
#include "libtciutil.h"
 
static char *join_path(const char *dir, const char *cmd)
{
    char    *full;
    size_t  dlen;
    size_t  clen;
 
    dlen = tci_strlen(dir);
    clen = tci_strlen(cmd);
    full = malloc(dlen + 1 + clen + 1);  /* +1 for '/', +1 for '\0' */
    if (!full)
        return (NULL);
    tci_strcpy(full, dir);
    full[dlen] = '/';
    tci_strcpy(full + dlen + 1, cmd);
    return (full);
}
 
static char *find_in_path(char *cmd)
{
    char    **dirs;
    char    *path;
    char    *full;
    int     i;
 
    path = getenv("PATH");
    if (!path)
        return (NULL);
    dirs = tciu_split(path, ':');
    if (!dirs)
        return (NULL);
    i = 0;
    full = NULL;
    while (dirs[i] && !full) {
        full = join_path(dirs[i], cmd);
        if (full && access(full, X_OK) != 0) {
            free(full);
            full = NULL;
        }
        i++;
    }
    i = 0;  /* restart from 0 — the search loop may have stopped mid-array */
    while (dirs[i])
        free(dirs[i++]);
    free(dirs);
    return (full);
}
 
void    exec_cmd(char **argv)
{
    char    *path;
    char    *envp[] = { NULL };  /* stripped env — programs that read PATH or HOME will fail */
 
    if (!argv || !argv[0])
        exit(1);
    path = find_in_path(argv[0]);
    if (!path) {
        tci_printf("pipeline: command not found: %s\n", argv[0]);
        exit(127);
    }
    execve(path, argv, envp);
    perror(path);   /* execve failed */
    free(path);
    exit(127);
}

exec_cmd never returns on success. The caller forks first, then calls exec_cmd in the child. In the parent, execution continues after the fork.

waitpid

waitpid suspends the calling process until the specified child exits. It fills in a status integer that encodes how the child terminated.

#include <sys/wait.h>
 
int     status;
pid_t   result;
 
result = waitpid(pid, &status, 0);  /* 0: block until child exits */
if (result < 0)
    perror("waitpid");

WIFEXITED(status) is true when the child exited normally. In that case, WEXITSTATUS(status) extracts the exit code — the same value as $? in the shell from f01/08, accessed here in C instead of the shell.

if (WIFEXITED(status))  /* false if the child was killed by a signal */
    tci_printf("exit code: %d\n", WEXITSTATUS(status));

Calling waitpid(-1, &status, 0) waits for any child — useful when you have forked several children and want to collect them all.

In g01b/05, SDL_PollEvent replaced blocking read — it returned immediately with or without an event. waitpid has the same switch: pass WNOHANG as the third argument and it returns immediately, yielding 0 if no child has finished yet. The pattern is the same as the game loop — non-blocking check, act on what is ready, continue.

A single-command test

Update main.c to fork one child and exec the command from argv:

#include <unistd.h>
#include <sys/wait.h>
#include "libtciutil.h"
 
void    exec_cmd(char **argv);
 
int main(int argc, char **argv)
{
    pid_t   pid;
    int     status;
 
    if (argc < 2) {
        tci_printf("usage: ./pipeline cmd [args]\n");
        return (1);
    }
    pid = fork();
    if (pid < 0) {
        perror("fork");
        return (1);
    }
    if (pid == 0)
        exec_cmd(argv + 1);  /* child: exec argv[1] with remaining args */
    waitpid(pid, &status, 0);
    if (WIFEXITED(status))
        return (WEXITSTATUS(status));
    return (1);
}

Build and test:

make re
./pipeline ls -l
./pipeline echo hello world
./pipeline nonexistent_command
echo $?

ls -l lists the directory. echo hello world prints the arguments. nonexistent_command exits with code 127.

The binary forks, execs, and waits. That is the complete process lifecycle. The next page connects two of these with a pipe.