Re: RFC: dd oflag=trunc to support in place filtering of files

Pádraig Brady Fri, 06 Jun 2014 04:23:31 -0700

On 06/06/2014 07:34 AM, Bernhard Voelker wrote:
> On 06/05/2014 03:27 PM, Pádraig Brady wrote:
>> The thought just occurred to me that this could be useful
>> to filter large files in place? For example:
>>
>>   grep whatever file.big | dd bs=1M conv=notrunc oflag=trunc
> 
> I guess you meant this:
> 
>    grep whatever file.big | dd bs=1M conv=notrunc oflag=trunc \
>               of=file.big
>


right

>> That would assume that grep never outputs more than it reads,
>> and would issue a final truncate along the lines of:
>>
>>   ftruncate(STDOUT_FILENO, lseek(STDOUT_FILENO, 0, SEEK_CUR));
>>
>> Useful enough to add?
> 
> While it sounds very useful, it looks like a powerful
> way to shoot oneself in the foot, e.g. when the producer
> command aborts
> 
>   grep --unknown PAT file | dd ...
>   grep: unrecognized option '--unknown'
> 
> ... then dd probably wouldn't be able to detect
> the failure and truncate the file - so the original data would
> be lost.

Good point. Also if there was an I/O error reading the file,
dd would nuke any data after that.

> Second, regarding the already mentioned restriction that the
> producer doesn't output more data than the original size of
> the input file, e.g.
> 
>   cat -n file | dd conv=notrunc of=file ...
> 
> Is this really an issue?  It (surprisingly!) already seems to
> work, even with "obs=1".  And if it is, how could we detect this?

This could be working due to readahead buffering in the kernel,
but would not be general and fail eventually.

> As a side note, "oflag=trunc" may not be enough to describe
> what it does ... it truncates the output file *after* the
> data copying.  So what about something like "oflag=truncpost"?

Yes better.

Given the I/O error handling above I'm not sure thie is a feasible option.

thanks,
Pádraig.

Re: RFC: dd oflag=trunc to support in place filtering of files

Reply via email to