The previous page forked a child and exec'd a single command. This page
connects two commands with a kernel pipe. Three system calls make it
happen: pipe, dup2, and close. The third is deceptively
important — every call to close is load-bearing. Miss one and the
program hangs indefinitely.
pipe
pipe creates a kernel buffer and returns a file descriptor for each
end:
fd[0]— the read endfd[1]— the write end
#include <unistd.h>
int fd[2];
if (pipe(fd) < 0) {
perror("pipe");
exit(1);
}
/* fd[0]: read end, fd[1]: write end */The buffer is in the kernel — no file on disk, no filesystem path.
Bytes written to fd[1] queue up in the kernel's buffer and can be
read from fd[0]. That is all | is in
f01/06:
the shell calls pipe, forks two children, and wires fd[1] to the
left command's stdout and fd[0] to the right command's stdin.
The buffer has a capacity — 65536 bytes on Linux. If the writer fills
it faster than the reader drains it, write blocks. If the buffer is
empty, read blocks — until bytes arrive, or until every process
holding the write end has closed it.
That last clause is the mechanism behind EOF on a pipe.
This is why a sysadmin can dump a production database and compress it in a single command:
pg_dump mydb | gzip > backup.sql.gzpg_dump writes rows to fd[1] as it reads them from the database.
gzip reads from fd[0] as the bytes arrive, compresses them, and
writes to the output file.
The uncompressed data never touches disk — it passes through the kernel buffer directly. Without the pipe you would need to dump first (write ten gigabytes), then compress (read ten, write five) — nearly double the disk space, and two full passes over the data.
The pipe turns it into one. The | gzip half is what this page
builds; the > that saves the compressed output to disk is the next
page. Before either can work without hanging, one rule must hold: every
process closes every pipe end it does not use.
The close rule
Every open file descriptor has a reference count maintained by the
kernel. fork increments that count on every descriptor the parent
has open: after the call, both parent and child hold a copy of each
end of the pipe. The write end now has a reference count of two.
EOF appears on the read end only when the write end's reference count
drops to zero. If any process holds fd[1] open — even a process that
never writes a single byte — the reader never sees EOF. It blocks
forever.
The rule follows from the mechanism: after forking, every process closes every pipe end it does not use, before doing anything else.
Read it as a checklist: for each fork, trace every open copy of
fd[1]. Count them. Make sure every one is closed before waitpid.
If any copy is left open, the reader never unblocks.
dup2
The child in the diagram above does not write to fd[1] directly.
It redirects its own stdout to the pipe, then exec's a command that
writes to stdout — without knowing a pipe is involved. This is what
makes the Unix pipeline composable: ls, grep, wc are all
oblivious to whether their stdout is a terminal, a file, or a pipe.
dup2(oldfd, newfd) makes newfd refer to the same open file as
oldfd. If newfd is already open, it is closed first.
dup2(fd[1], STDOUT_FILENO); /* stdout now writes to the pipe */
close(fd[1]); /* dup2 made a copy at slot 1; fd[1] is redundant */After dup2(fd[1], STDOUT_FILENO), both fd[1] and STDOUT_FILENO
(1) point to the write end. That means the write end's reference count
is back up. Close fd[1] immediately so only slot 1 remains.
The exec'd command writes to fd 1 as always. It has no knowledge
of the pipe.
This is the same mechanism as > in
f01/06.
When the shell runs ls > file.txt, it calls dup2 to point fd 1
at the file, then exec's ls. The > and | operators are both
dup2 calls dressed up as syntax — and you will add > and < to
the pipeline on the next page.
Two-process pipe test
Connect two commands with a single pipe. Both children must be forked before either exec's — the pipe has to exist before both processes start.
#include <unistd.h>
#include <sys/wait.h>
#include "libtciutil.h"
void exec_cmd(char **argv);
static void run_two(char **cmd1, char **cmd2)
{
int fd[2];
pid_t pid1;
pid_t pid2;
int status;
if (pipe(fd) < 0) {
perror("pipe");
exit(1);
}
pid1 = fork();
if (pid1 < 0) { perror("fork"); exit(1); }
if (pid1 == 0) {
close(fd[0]); /* writer does not read */
dup2(fd[1], STDOUT_FILENO); /* stdout → pipe write end */
close(fd[1]); /* redundant copy closed */
exec_cmd(cmd1);
}
pid2 = fork();
if (pid2 < 0) { perror("fork"); exit(1); }
if (pid2 == 0) {
close(fd[1]); /* reader does not write */
dup2(fd[0], STDIN_FILENO); /* stdin → pipe read end */
close(fd[0]); /* redundant copy closed */
exec_cmd(cmd2);
}
/* parent owns neither end */
close(fd[0]);
close(fd[1]);
waitpid(pid1, &status, 0);
waitpid(pid2, &status, 0);
}The two forks happen before either exec. If cmd1 were exec'd before
forking cmd2, cmd1 would run to completion before the reader
existed — the write end would close with nobody on the other side.
Update main.c to call run_two:
int main(int argc, char **argv)
{
char *cmd1[] = { "ls", NULL };
char *cmd2[] = { "wc", "-l", NULL };
(void)argc;
(void)argv;
run_two(cmd1, cmd2);
return (0);
}Build and test:
make re
./pipelineExpected: a single number — the count of entries ls produced,
as counted by wc -l. The same result as ls | wc -l in the shell.
Swap in a different pair to confirm:
char *cmd1[] = { "cat", "/etc/passwd", NULL };
char *cmd2[] = { "grep", "root", NULL };
run_two(cmd1, cmd2);Every line from /etc/passwd that contains root. The pipe carries
bytes between two unrelated programs — neither knows the other exists.
The next page adds file redirects: an input file feeds cmd1's stdin,
an output file captures cmd2's stdout.