On 4/14/21 1:26 PM, enh wrote: > Could you read the linux doc thing and confirm that the behavior you want > is > still to stop at TRAILER instead of flushing hardlink context but > otherwise > continuing to extract like the kernel guys documented for initramfs? (Or > am I > misremembering? It's been a while...) > > in the thread you linked to, they say "I wonder how existing GNU or BSD cpio > ... > would deal with reading such a file". all i'm saying is "GNU cpio exits on the > next record boundary, and people have scripts that rely on this".
My question was really "would continuing until you run out of cpio records, and exiting with an error if there were no cpio records, also satisfy those scripts"? > the Linux docs say things like > > The cpio "TRAILER!!!" entry (cpio end-of-archive) is optional, but is > not ignored; see "handling of hard links" below. > > but that doesn't match what actual implementations of cpio do. (assuming you > don't interpret optional as meaning "you don't have to have one, but if you > don't, the tool will exit with an error complaining that you don't have one" > :-) ) A year or three back I had it not adding the TRAILER!!! entry, then added a --trailer option, and you submitted a commit removing that option so it always adds TRAILER now. But not having one isn't unprecedented... > i think the most interesting thing for me in the docs was: > > When a "TRAILER!!!" end-of-archive marker is seen, the tuple buffer is > reset. This permits archives which are generated independently to be > concatenated. > > because -- even if i haven't really understood _why_ people are concatenating > cpio files -- at least this shows that the main consumers/producers agree that > this is an expected use case. They're incrementally generating filesystems, using a base cpio and then adding more entries. If your base has /dev/console style nodes in it with special ownership and permissions which you can't create locally as a normal user, you have to use an awkward tool like gen-initramfs-cpio from the kernel source to generate synthetic cpio entries. But you then often want to append a directory full of files that live in your local filesystem using normal "find | cpio". It's also a poor man's form of initramfs package management: select this and this and this without extracting them all into a temporary directory and then packaging up the directory (and potentially having permissions/ownership/timestamps change). This trick can even drop start files next to each other in etc/rc for sysvinit to pick up and run on boot. > i'm assuming the "exit when you see TRAILER!!! and let the next cpio instance > worry about the rest" behavior is just the least-effort implementation of the > hard-link flush stuff: > > To combine file data from different sources (without having to > regenerate the (c_maj,c_min,c_ino) fields), therefore, either one of > the following techniques can be used: > > a) Separate the different file data sources with a "TRAILER!!!" > end-of-archive marker, or > > exiting when you see TRAILER!!! implicitly loses any cpio state, and reporting > an error if you hit EOF without seeing TRAILER!!! lets you know when to stop > running a new cpio? Least-effort implementation of flush is what I'm assuming too. I prefer to put in more effort and doing it right. Extracting the whole archive seems like the correct behavior because it's what the kernel initramfs plumbing does, and given that posix yanked cpio back in susv2 and Jorg "Solaris Solaris Uber Alles" Schilling got outright indignant when I suggested putting it back because it's actually _used_, that means the only modern spec we have is the kernel spec (that I am aware of). > (i think the doc is trying to distinguish between a cpio file [where > TRAILER!!! > marks the end] and an "initramfs buffer" which can contain multiple > concatenated > cpio files [and hence more than one TRAILER!!!]. so things processing > initramfs > buffers need to be cleverer than cpio when it comes to TRAILER!!!, but cpio > doesn't. [and in practice, isn't.]) I agree gnu cpio doesn't, but that's because gnu. > i think that answers your question, but perhaps in excessive detail, so i'll > re-quote you and try again: I don't mind excessive detail when I'm trying to figure out the correct course of action for a design issue. >> confirm that the behavior you want is >> still to stop at TRAILER instead of flushing hardlink context but otherwise >> continuing to extract > > i agree that based on the Linux docs it would be more sensible to flush but > continue, but that's demonstrably not what GNU cpio does, so it doesn't seem > particularly helpful for us to do it. callers already have to have the bash > while loop nonsense, Some callers do, and I agree we can't _break_ them. But rendering the loop a NOP doesn't break it. > and implementing the better behavior in toybox would still > be "broken" from that perspective because they'd loop forever --- toybox would > at least have to consider the empty input as an error, Empty input should be an error, yes. That's consistent with tar: $ toybox cpio -i < /dev/null $ echo $? 0 $ toybox tar x < /dev/null tar: Not tar $ echo $? 1 Whatever else we decide to do here, making empty input be an error sounds correct to me. Of course the gnu/dammit version goes: $ cpio -i < /dev/null Found end of tape. To continue, type device/file name when ready. Which... no? Just no. > at which point we haven't > really reduced the ugliness much? (i'm also scared to suggest anything beyond > "do what GNU does" because i don't personally know anything about cpio, and > have > never used it except to generate minimal repro cases for stuff that kernel > folks > bring up. I have, sadly, had to learn rather a lot about it although I don't claim to be an expert. Still, if we're changing the behavior, eating all the input seems more correct, and erroring on empty input seems like it would satisfy the loops people are using to work around the limitations in the gnu/dammit implementation (a limitation which is already not present in the kernel's implementation used by initramfs). > i haven't looked at BSD, but they seem to interpret TRAILER!!! as end > of archive too: https://www.freebsd.org/cgi/man.cgi?query=cpio&sektion=5 ... > and Have they changed the behavior of their tool in the past 25 years? (It's not like 64 bit processors or large file support means much when your file format is 8 hex digits for all the metadata fields...) > eighthly, carrying on past TRAILER!!! when no-one else does sounds like one of > those security issues Android had back in the "zip master key" days; even if > the I didn't hear this story, but it sounds unpleasant. > format is stupid, it's safer when everyone interprets the format the same > way... > who knows what crap people are accidentally/deliberately ignoring past a > TRAILER!!! that isn't actually at the end [because they _don't_ have the bash > while loop]? i'd prefer not to find out :-) ) Ok, valid point. But if they feed such an initramfs into the kernel it will process all those records now, so the behavior _isn't_ currently consistent. Who are the users of this you're seeing? (And the other major user of this (that I'm aware of) is RPM package format. I don't know what do they do, because I don't know where the source to the rpm tools lives. I lost track circa https://lwn.net/Articles/196523/ and moved to .deb based systems anyway...) Hmmm. If you're really concerned about more capable default behavior being nebulously unsafe in a way that I can't prove a negative (grumble grumble), maybe it needs an --all option? The man page doesn't mention -A but of course: $ cpio -iA < /dev/null cpio: --append is meaningless with --extract $ cpio -ia < /dev/null cpio: --reset is meaningless with --extract This is gnu we're talking about: they only actually document stuff in "info" pages. Sigh. (The man page mentions --append but has no short option for it. The only reset it mentions is --reset-access-time and it doesn't say what that DOES...) Needing a for loop around the tool seems broken to me. Not breaking people's workarounds is important, but implementing behavior that WON'T while rendering the workaround unnecessary seems easy enough? People depending on a limitation of the tool for "security" is hard for me to say anything coherent about. I _want_ it to be the default behavior, but if it needs to be an option... > hmm. my second attempt seems to have more words than my first. i'll stop here. Another reason the for loop creeps me out is programs read more data than they actually need from input ALL THE TIME. It's how ansi FILE * buffers work, and an input pipe isn't seekable so you can't put the data _back_ if you find yourself with extra and are about to exit. This is an implicit dependence on an implementation detail, that you can continue from where the previous program left off reading the same pipe without having lost anything to buffers reading ahead. (Yes, I wrote my cpio with fd rather than FILE for that reason, but DEPENDING on it? Ew.) > (i noticed as well that everyone seems to actually deal in _compressed_ cpio > files, so in an ideal world i suspect cpio should be as intelligent as tar > when > it comes to such things --- but i think cpio'ing is too niche to warrant doing > anything better than GNU.) The linux kernel already does better than gnu. That's why they wrote their own cpio create and extract plumbing. Create is in: https://github.com/torvalds/linux/blob/master/usr/gen_init_cpio.c https://github.com/torvalds/linux/blob/master/usr/gen_initramfs.sh And extract is: https://github.com/torvalds/linux/blob/master/init/initramfs.c#L256 Dunno what rpm is doing behind the scenes, but the kernel guys have talked about xattr support (and sparse files, and 64 bit timestamps, and...) on more than one occasion. Hence my todo list section on that, albeit in the probably post-1.0 "teach patch.c about the git file rename syntax" sort of way... Linux cpio outgrowing gnu is probably inevitable. Richard Stallman is not steering anything, hasn't been for decades. (He's sitting in a big chair making vroom-vroom noises with his mouth, but the wheel and pedals aren't connected to anything.) Rob P.S. sparse files are also a potential way to handle files > 4 gigs, by breaking them into segments, but this is initramfs we're talking about so people generally make pained noises when it comes up. P.P.S. RPM also addressed large file support, but as usual is profoundly unhelpful in saying exactly HOW ala https://rpm.org/devel_doc/large_files.html because their business model is to obfuscate stuff until you pay them thousands of dollars to be experts and not ask questions. I'm under the impression they named this business model "enterprise" after the way the holodeck keeps malfunctioning and trying to kill people. See also systemd. _______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net