thecodingidiot.com

The TerminalThe Unix Philosophy

The Unix Philosophy

Unix was designed around a single idea: programs should do one thing well, and they should be composable. A program that reads text and outputs text can be connected to any other program that does the same. The pipe operator is how you connect them. The redirect operators are how you connect programs to files.

Standard streams

Every process has three standard streams:

  • stdin (0) — where it reads input from
  • stdout (1) — where it writes normal output
  • stderr (2) — where it writes error messages

By default, stdin is your keyboard and stdout/stderr are your terminal. Pipes and redirects change where these streams go.

The pipe operator

cat /etc/passwd | grep 'bash'

| connects the stdout of the left command to the stdin of the right command. cat reads the file and writes it to stdout; grep reads that stdout as its stdin and filters it. The two processes run concurrently.

Build longer pipelines:

cat /var/log/syslog | grep 'ERROR' | sort | uniq -c | sort -rn

Reading left to right: read the log, keep only ERROR lines, sort them, count duplicates, sort by frequency descending. Five programs, none of which knows about the others, producing a useful result together.

Pipes are concurrent, not sequential

When you write cmd1 | cmd2, both processes start at the same time. cmd1 writes to its stdout; cmd2 reads from its stdin as the data arrives. Neither waits for the other to finish. This is not a detail — it changes what pipes are useful for.

Watch what happens with a large counter:

seq 1 1000000 | head -3

seq 1 1000000 would print a million numbers. head -3 exits after reading three lines and closes the pipe. The moment the read end closes, seq receives a signal and stops. The whole command returns in milliseconds, printing three lines.

If pipes were sequential — left finishes, then right starts — this would grind through a million numbers before printing anything. It does not. The two processes run in parallel and the pipeline stops as soon as the consumer is satisfied.

The practical consequence: pipes handle data of any size. A log file with ten million lines processes the same way as one with ten. A stream that never ends — a live log, a download in progress, sensor output — can be filtered and processed in real time without ever writing the full content to disk first.

You will build the internals of this in c05. fork(), pipe(), dup2() — the system calls that | is made of. By then it will be obvious why the two sides have to run concurrently.

This is the Unix philosophy. It is also the mental model behind every stream-processing system you will encounter in your career.

Output redirection

ls -la > listing.txt

> redirects stdout to a file, truncating[1] the file first. If listing.txt exists, its previous contents are gone.

ls -la >> listing.txt

>> appends to the file instead of truncating.

echo "build started: $(date)" >> build.log

$(...) runs a command and substitutes its output inline. This is command substitution — you will use it constantly in shell scripts.

Input redirection

sort < names.txt

< feeds a file as stdin to a program. sort normally reads from stdin; this feeds it names.txt without using cat. The result is the same as cat names.txt | sort, but with one fewer process.

Redirecting stderr

./program 2> errors.log

2> redirects stderr (file descriptor 2) to a file. Normal output still goes to the terminal.

./program > output.log 2> errors.log    # stdout and stderr to separate files
./program > output.log 2>&1             # both to the same file
./program 2>/dev/null                   # discard all errors

/dev/null is a special file that discards everything written to it. When you do not care about error output, send it there.

A practical example

You have a directory of C source files. You want to find every file that calls malloc, save the results, and also see them on screen:

grep -rn 'malloc' src/ 2>/dev/null | tee results.txt

tee reads stdin and writes it to both stdout and a file simultaneously. The 2>/dev/null suppresses any permission errors from directories grep cannot enter.

Every open file has a number: a file descriptor[2]. The kernel assigns them in order. When a process starts, three are already open: 0 is stdin, 1 is stdout, 2 is stderr. When you type > file.txt, you are telling the shell to point file descriptor 1 at file.txt instead of the terminal. The program never knows the difference — it still writes to fd 1. Only where fd 1 points has changed.

2>&1 reads as: "point fd 2 at whatever fd 1 currently points at." The order matters. > file.txt 2>&1 sends stdout to the file, then points stderr at the same file. 2>&1 > file.txt redirects stderr to the terminal first (where stdout still points), then points stdout at the file — stderr stays on the terminal.

In c05/04 you will call dup2(fd, 1) directly — the system call that moves a file descriptor to a specific slot. The > operator is the shell doing exactly that on your behalf.

Footnotes

  1. Truncation - Wikipedia

  2. File descriptor - Wikipedia