On 06/06/2014 07:34 AM, Bernhard Voelker wrote: > On 06/05/2014 03:27 PM, Pádraig Brady wrote: >> The thought just occurred to me that this could be useful >> to filter large files in place? For example: >> >> grep whatever file.big | dd bs=1M conv=notrunc oflag=trunc > > I guess you meant this: > > grep whatever file.big | dd bs=1M conv=notrunc oflag=trunc \ > of=file.big >
right >> That would assume that grep never outputs more than it reads, >> and would issue a final truncate along the lines of: >> >> ftruncate(STDOUT_FILENO, lseek(STDOUT_FILENO, 0, SEEK_CUR)); >> >> Useful enough to add? > > While it sounds very useful, it looks like a powerful > way to shoot oneself in the foot, e.g. when the producer > command aborts > > grep --unknown PAT file | dd ... > grep: unrecognized option '--unknown' > > ... then dd probably wouldn't be able to detect > the failure and truncate the file - so the original data would > be lost. Good point. Also if there was an I/O error reading the file, dd would nuke any data after that. > Second, regarding the already mentioned restriction that the > producer doesn't output more data than the original size of > the input file, e.g. > > cat -n file | dd conv=notrunc of=file ... > > Is this really an issue? It (surprisingly!) already seems to > work, even with "obs=1". And if it is, how could we detect this? This could be working due to readahead buffering in the kernel, but would not be general and fail eventually. > As a side note, "oflag=trunc" may not be enough to describe > what it does ... it truncates the output file *after* the > data copying. So what about something like "oflag=truncpost"? Yes better. Given the I/O error handling above I'm not sure thie is a feasible option. thanks, Pádraig.