Avoiding large memory buffers

Allocating large amounts of memory which may not be used, is a common technique to simplify code. However it's not without cost, even in the presence of overcommit, nor is it necessarily the most CPU efficient technique, in the presence of other processes and due to the changing memory hierarchy.

The issue with overcommit, is that it makes program operation dependent on the overcommit config for the system and possibly current memory conditions. I.E. the kernel may be configured to disable overcommit so that processes fail early rather than at some unspecified point in the future, or overcommit may not even be supported on the system. Also any implicit writes to the memory area will defeat the overcommit strategy. For example calloc() will zero the memory making overcommit ineffective on my system at least. Even memory allocated with malloc() is not immune to this issue as if MALLOC_PERTURB_ is defined in the process environment, this will initialize and thus fully allocate the memory also.

Generally it's better to allocate less memory at the cost of some extra processing and more complicated code as that allows for better horizontal scalability of your processes. Also biasing towards smaller chunks of memory can better utilize increasingly performant CPU caches. If CPU overhead is significantly increased when not using a fixed size buffer, then it's useful to use a simple buffer but also determine a crossover point so that once that amount of memory is allocated, the process switches to an alternative approach.

If your algorithm requires using large amounts of memory, then only allocating it as needed is a better strategy, as that's often the edge case, or at least allocating the memory as needed may allow the system to deal with the process more intelligently.

Memory improvements in coreutils

There have been recent coreutils changes in this vein, which will be available soon in coreutils >= 8.22:

cut no longer uses a bit array which could grow very large when large numbered fields were specified. The current implementation is also much more CPU efficient, with the performance summarised in each of these changes:
dd now avoids buffer allocations unless needed,
i.e. when no read()s are performed. This is the case when skipping over seekable input, or when count=0.
head --bytes=-N only allocates memory as needed,
i.e. when reading small inputs there is no need to allocate the upper bound N value up front.
shuf uses reservoir-sampling for large or unknown sized inputs.
Reservoir sampling optimizes selecting K random lines from large or unknown-sized input.
sort currently allocates large memory buffers up front. This may be changed in future to be more cache aware
split --line-bytes only allocates memory as needed.
Previously it would allocate the worst case memory requirement of the split size up front. Now it only allocates this memory when needed, which is only when we encounter a large line and have already output lines in this split.