Effective handling of the SIGPIPE informational signalEven very commonly used programs, examples of which are given below, can handle pipe signals inappropriately.
Here we discuss considerations for effective handling of closed pipes.
pipelines as a functional conceptshell pipelines are a functional programming concept, supporting functional composition and lazy evaluation etc. An example benefit of composing stateless functions/filters is implicit multi-core support, and simplified distributed processing support. The discussions below focus on the mechanisms to support lazy evaluation.
signaling to support lazy evaluationConsider for example sort(head(gen(),10)) in a traditional language, which is equivalent to gen | head -n10 | sort in UNIX shell.
To support lazy evaluation, there needs to be a way to get gen() to stop producing. That could be thought of as back pressure in the pipe, and is achieved in UNIX through the use of SIGPIPE, which will by default terminate the writer to a closed pipe.
Note the reason SIGPIPE is generated, and not just an EPIPE error on write(), is that it simplifies programs to not have to handle EPIPE specifically, thus leveraging implicit logic from this functional paradigm. Also SIGPIPE allows to distinguish pipe close from other I/O problems (write errors). I.E. the parent can tell that a child terminated due to SIGPIPE and usually not diagnose an error in that case.
Incorrect handling of SIGPIPEIf programs are catching SIGPIPE themselves, when the implicit handling isn't appropriate, then they must be sure to handle this informational signal appropriately. A few examples where this isn't done in commonly used programs are:
python doesn't reset SIGPIPE handler for sub processes.
I.E. any pipelines spawned off in python will have SIGPIPE ignored, and thus
may behave incorrectly. This was fixed in python 3, and backported fixes for python 2 are under consideration.
Note the python interpreter itself still ignores SIGPIPE, but doesn't
propagate EPIPE errors to the exit status. But it does still display
the errors which is strange.
$ python -c 'import this, sys; sys.stdout.flush()' | : IOError: [Errno 32] Broken pipe $ echo $? 0
openssl has inappropriate handling of pipe errors
which can result in either, redundant errors, like with
# Generate a certain amount of seeded pseudo random data openssl enc -aes-256-ctr -pass pass:seed -nosalt </dev/zero | head -c1 >/dev/nullor redundant writes, like with
openssl rand -base64 10000000 | head -n1
xargs even though a traditional UNIX tool,
handles SIGPIPE inappropriately.
$ yes 1234 | xargs -n1 | head -n1 1234 xargs: /bin/echo: terminated by signal 13That's not normally the behavior of the shell as can be seen with:
$ yes 1234 | head -n1 1234
bash or zsh with pipefail have questionable SIGPIPE handling.
Given the discussion above that SIGPIPE is informational
and normally not diagnosed by the shell, then it's
very surprising that it generates a failure indication with pipefail enabled.
$ set -o pipefail $ yes | head -n1 || echo error y errorNote pipefail is a good idea but is non standard. You can hack things to get equivalent behavior in simple pipelines though. For example I used the following hack to avoid hiding errors and fix an rpmbuild issue recently.
bzip2 -dc corrupt.bz2 || echo cause_patch_to_fail | patchOr you can use a more general technique like I did in my ls wrapper script, to optain individual exit statuses.
Cases where SIGPIPE is not sufficient
Intermittent sourcesRelying on SIGPIPE is not ideal for intermittent sources. For example cat | grep -m1 exit will only exit, when you type a line after you type "exit". This practically manifests itself for example with tail -f log | grep -m1 'major error' && action_major_error. Since tail can hang around forever and -f is often processing intermittent input, it should really have extra support in tail for detecting the pipe going away in a timely manner, using poll(POLLHUP) or similar mechanism.
Multiple outputsThe tee command writes to multiple outputs which doesn't map well to the implicit handling of SIGPIPE on any particular output. To improve the situation a -p option was added to tee which will continue with other outputs in the presence of SIGPIPE, or other possible actions can be selected with the more fine grained --output-error option.
Since talking about tee, it's worth mentioning the >(...) shell construct that is often used with tee, for example gen | tee >(process_1) | process_2. Since tee is just writing to "files" and not managing forking children itself, it can't distinguish if process_1 exited due to error or if it was just finished processing. I.E. tee will just get an EPIPE error in both cases. This is a limitation of the >(...) shell construct, rather than tee, with the consequence being that for robust handling of errors, the commands within the >(...) construct need to consume all the data presented.