Unix was designed around a single idea: programs should do one thing well, and they should be composable. A program that reads text and outputs text can be connected to any other program that does the same. The pipe operator is how you connect them. The redirect operators are how you connect programs to files.
Standard streams
Every process has three standard streams:
- stdin (0) — where it reads input from
- stdout (1) — where it writes normal output
- stderr (2) — where it writes error messages
By default, stdin is your keyboard and stdout/stderr are your terminal. Pipes and redirects change where these streams go.
The pipe operator
cat /etc/passwd | grep 'bash'| connects the stdout of the left command to the stdin of the right
command. cat reads the file and writes it to stdout; grep reads
that stdout as its stdin and filters it. The two processes run
concurrently.
Build longer pipelines:
cat /var/log/syslog | grep 'ERROR' | sort | uniq -c | sort -rnReading left to right: read the log, keep only ERROR lines, sort them, count duplicates, sort by frequency descending. Five programs, none of which knows about the others, producing a useful result together.
Pipes are concurrent, not sequential
When you write cmd1 | cmd2, both processes start at the same time.
cmd1 writes to its stdout; cmd2 reads from its stdin as the data
arrives. Neither waits for the other to finish. This is not a detail
— it changes what pipes are useful for.
Watch what happens with a large counter:
seq 1 1000000 | head -3seq 1 1000000 would print a million numbers. head -3 exits after
reading three lines and closes the pipe. The moment the read end
closes, seq receives a signal and stops. The whole command returns
in milliseconds, printing three lines.
If pipes were sequential — left finishes, then right starts — this would grind through a million numbers before printing anything. It does not. The two processes run in parallel and the pipeline stops as soon as the consumer is satisfied.
The practical consequence: pipes handle data of any size. A log file with ten million lines processes the same way as one with ten. A stream that never ends — a live log, a download in progress, sensor output — can be filtered and processed in real time without ever writing the full content to disk first.
You will build the internals of this in
c05.
fork(), pipe(), dup2() — the system calls that | is made of.
By then it will be obvious why the two sides have to run concurrently.
This is the Unix philosophy. It is also the mental model behind every stream-processing system you will encounter in your career.
Output redirection
ls -la > listing.txt> redirects stdout to a file, truncating[1] the file first. If
listing.txt exists, its previous contents are gone.
ls -la >> listing.txt>> appends to the file instead of truncating.
echo "build started: $(date)" >> build.log$(...) runs a command and substitutes its output inline. This is
command substitution — you will use it constantly in shell scripts.
Input redirection
sort < names.txt< feeds a file as stdin to a program. sort normally reads from
stdin; this feeds it names.txt without using cat. The result is
the same as cat names.txt | sort, but with one fewer process.
Redirecting stderr
./program 2> errors.log2> redirects stderr (file descriptor 2) to a file. Normal output
still goes to the terminal.
./program > output.log 2> errors.log # stdout and stderr to separate files
./program > output.log 2>&1 # both to the same file
./program 2>/dev/null # discard all errors/dev/null is a special file that discards everything written to it.
When you do not care about error output, send it there.
A practical example
You have a directory of C source files. You want to find every file
that calls malloc, save the results, and also see them on screen:
grep -rn 'malloc' src/ 2>/dev/null | tee results.txttee reads stdin and writes it to both stdout and a file
simultaneously. The 2>/dev/null suppresses any permission errors
from directories grep cannot enter.
Every open file has a number: a file descriptor[2]. The kernel
assigns them in order. When a process starts, three are already open:
0 is stdin, 1 is stdout, 2 is stderr. When you type > file.txt, you
are telling the shell to point file descriptor 1 at file.txt instead
of the terminal. The program never knows the difference — it still
writes to fd 1. Only where fd 1 points has changed.
2>&1 reads as: "point fd 2 at whatever fd 1 currently points at."
The order matters. > file.txt 2>&1 sends stdout to the file, then
points stderr at the same file. 2>&1 > file.txt redirects stderr to
the terminal first (where stdout still points), then points stdout at
the file — stderr stays on the terminal.
In c05/04 you will call dup2(fd, 1) directly — the
system call that moves a file descriptor to a specific slot. The >
operator is the shell doing exactly that on your behalf.