Effective handling of the SIGPIPE informational signal

Even very commonly used programs, examples of which are given below, can handle pipe signals inappropriately.
Here we discuss considerations for effective handling of closed pipes.

pipelines as a functional concept

shell pipelines are a functional programming concept, supporting functional composition and lazy evaluation etc. An example benefit of composing stateless functions/filters is implicit multi-core support, and simplified distributed processing support. The discussions below focus on the mechanisms to support lazy evaluation.

signaling to support lazy evaluation

Consider for example sort(head(gen(),10)) in a traditional language, which is equivalent to gen | head -n10 | sort in UNIX shell.

To support lazy evaluation, there needs to be a way to get gen() to stop producing. That could be thought of as back pressure in the pipe, and is achieved in UNIX through the use of SIGPIPE, which will by default terminate the writer to a closed pipe.

Note the reason SIGPIPE is generated, and not just an EPIPE error on write(), is that it simplifies programs to not have to handle EPIPE specifically, thus leveraging implicit logic from this functional paradigm. Also SIGPIPE allows to distinguish pipe close from other I/O problems (write errors). I.E. the parent can tell that a child terminated due to SIGPIPE and usually not diagnose an error in that case.

Incorrect handling of SIGPIPE

If programs are catching SIGPIPE themselves, when the implicit handling isn't appropriate, then they must be sure to handle this informational signal appropriately. A few examples where this isn't done in commonly used programs are:

Cases where SIGPIPE is not sufficient

Intermittent sources

Relying on SIGPIPE is not ideal for intermittent sources. For example cat | grep -m1 exit will only exit, when you type a line after you type "exit". This practically manifests itself for example with tail -f log | grep -m1 'major error' && action_major_error. Since tail can hang around forever and -f is often processing intermittent input, it should really have extra support in tail for detecting the pipe going away in a timely manner, using poll(POLLHUP) or similar mechanism.

Multiple outputs

The tee command writes to multiple outputs which doesn't map well to the implicit handling of SIGPIPE on any particular output. To improve the situation a -p option was added to tee which will continue with other outputs in the presence of SIGPIPE, or other possible actions can be selected with the more fine grained --output-error option.

Since talking about tee, it's worth mentioning the >(...) shell construct that is often used with tee, for example gen | tee >(process_1) | process_2. Since tee is just writing to "files" and not managing forking children itself, it can't distinguish if process_1 exited due to error or if it was just finished processing. I.E. tee will just get an EPIPE error in both cases. This is a limitation of the >(...) shell construct, rather than tee, with the consequence being that for robust handling of errors, the commands within the >(...) construct need to consume all the data presented.

© Mar 5 2015