The Path to Logging Pro: Part 2. Everything is a file

Table of Contents

In Part 1 we worked with this command:

python program.py 2>&1 | tee output.txt

We said that 2>&1 makes stderr point to the same place as stdout, so both streams travel through the pipe and tee captures all of it. But there’s a question hiding in that explanation: what does it mean for two streams to “point to the same place”? Where exactly is that place?

To answer that, we need to pull on the thread a bit further.

One interface for everything #

Linux has a design principle that goes back to the earliest days of Unix: everything is a file.

This doesn’t mean everything is literally stored on your hard drive. It means something more interesting: the kernel — the core of the operating system, the program that manages your hardware and system resources — handles very different things and exposes them all through the same interface. Text files, process memory, network connections, hardware devices… they all speak the same language. And that language is file descriptors.

Why does this matter? Because when everything speaks the same language, you can combine tools in ways nobody anticipated. A pipe between two commands can print a PDF to a network printer. You can monitor system memory with cat. A process can inspect its own resources by reading a directory…

See it for yourself #

The most direct place to observe this is /proc. It’s a special directory — it doesn’t live on your disk, the kernel generates it on the fly in memory — that exposes information about the system and running processes as if they were ordinary text files.¹

Let’s try something simple. Open a terminal and run this:

$ sleep 100 &
[1] 47823

sleep 100 is a command that does nothing for 100 seconds — we’re using it just to have a live process to experiment with. The & at the end tells the shell to run it in the background, so you get your terminal prompt back immediately.

The number that appears, in our example 47823, is the PID (Process ID): the unique identifier the kernel assigns to every process, like a social security number. Every program running on your machine has one.

Now, with that process alive, look at what file descriptors it has open:

$ ls -la /proc/47823/fd
total 0
dr-x------ 2 abel abel  0 Apr  4 17:00 .
dr-xr-xr-x 9 abel abel  0 Apr  4 17:00 ..
lrwxrwxrwx 1 abel abel 64 Apr  4 17:00 0 -> /dev/pts/0
lrwxrwxrwx 1 abel abel 64 Apr  4 17:00 1 -> /dev/pts/0
lrwxrwxrwx 1 abel abel 64 Apr  4 17:00 2 -> /dev/pts/0

A few things are happening here at once.

The numbers 0, 1 and 2 are exactly the stdin, stdout and stderr from Part 1 — now you can see them as real entries in a directory, like any other file.

Each one points to /dev/pts/0. That’s your terminal — a character device representing the terminal session you have open. All three descriptors point to the same place because sleep inherited them from the shell that launched it: when the shell created the child process, it handed down its own descriptors. The process didn’t choose anything; it simply received them.²

Notice something subtler: sleep has no idea it’s connected to a terminal. It only knows it has three file descriptors. What’s on the other side is the kernel’s business. That separation between “I write to a descriptor” and “what’s actually there” is the key to everything that comes later in this series — and it also answers the question we opened with. When you write 2>&1, you’re telling the kernel to make FD 2 point to wherever FD 1 is pointing. The process doesn’t move; the plumbing changes.

Take a moment to explore `/proc` #

While you’re here, take five minutes to poke around. /proc is a direct window into the kernel and it rewards curiosity:

# Information about your CPU
$ cat /proc/cpuinfo

# System memory usage
$ cat /proc/meminfo

# Information about the current process (your own shell)
$ cat /proc/self/status

You don’t need to understand everything you see. The point is to internalize that this is real kernel data, exposed as plain text you can read with cat — the same tool you’d use to read a config file or a log.

A socket is a file descriptor too #

So far we’ve seen files on disk, the /proc pseudo-filesystem, and a terminal device. All file descriptors. A network connection works exactly the same way.

I have a Brother MFC-L2710DW printer at home connected over wifi. This printer exposes a direct print port — port 9100 — that accepts raw data. When I open a connection to that port, the kernel hands me back a file descriptor. As always.

Which means I can do something like this:

$ cat example.pdf | nc -q 0 192.168.1.35 9100

And the PDF comes out printed.

Let’s break down what’s happening at each step:

cat example.pdf reads the bytes from the file and writes them to its stdout — a file descriptor.
The pipe | connects that stdout to the stdin of nc — another file descriptor.
nc (netcat) opens a TCP socket to the printer’s IP and forwards everything it receives through stdin over that connection — one more file descriptor.

Three completely different resources — a file on disk, a pipe, a TCP connection — and all three behave the same way from each process’s point of view: they’re file descriptors that bytes flow through.

Note: this works because this specific printer accepts raw PDF on port 9100. Many printers expect other formats like PostScript or PCL and would need conversion first. If you try this with a different printer and get pages full of garbled text, that’s why.

Why this matters for logging #

Here’s where everything ties back to the series.

When an application writes a log, it’s not doing anything special. It’s writing bytes to a file descriptor. The kernel doesn’t know — or care — whether those bytes go to a text file on disk, to a socket pointing at a centralized log system, or to /dev/null.

That indifference is precisely what makes modern logging architecture possible. You can redirect an application’s logs to a file without touching its code. You can ship them to a remote aggregator without the process knowing it exists. You can silence them entirely without changing anything in the application.

All of these are just different ways of wiring up file descriptors. The process writes to its own, and the system decides what’s on the other end.

In the next post we’ll see how systemd takes advantage of exactly this to capture and manage logs from every service running on your machine.

The /proc filesystem is documented in the Linux kernel manual: https://www.kernel.org/doc/html/latest/filesystems/proc.html ↩︎
This file descriptor inheritance is part of how the fork() syscall works — the mechanism Linux uses to create new processes. You can read more in the POSIX spec: https://pubs.opengroup.org/onlinepubs/9699919799/functions/fork.html ↩︎

One interface for everything #

See it for yourself #

Take a moment to explore /proc #

A socket is a file descriptor too #

Why this matters for logging #

Take a moment to explore `/proc` #