On Tue, Sep 12, 2023 at 12:36 PM Rob Landley <r...@landley.net> wrote: > > On 9/11/23 23:56, Oliver Webb via Toybox wrote: > > I have made a implementation of the 'csplit' command in about 160 lines of > > code. > > You have TOYFLAG_MAYFORK on this command. Sigh, explaining the lib/toyflags.h > values is one of the tutorial videos I need to make. > > Forking is the default behavior for launching new commands in toybox. > TOYFLAG_NOFORK and TOYFLAG_MAYFORK are for the toybox shell (sh.c). The first > indicates a shell builtin that can only run within the shell's process (like > "cd", since forking a child process, calling chdir() in the child, and having > the child exit doesn't actually change the parent's getcwd() value). NOFORK > commands don't show up in the command list output by running "toybox", but > they > do show up in the command list you get by running "help" with no arguments in > the shell. > > The second (MAYFORK) indicates a command that _can_ run standalone, and thus > shows up in the "toybox" list so the installer creates a symlink for it in the > search $PATH, but when it runs from toysh it acts like NOFORK and is a > function > call made by the current process (and eventually returns back to the shell so > the shell's PID can go on and do more shell things afterwards). This allows > the > command to access the shell's data structures, and thus perform additional > functions such as setting environment variables in the shell (printf %n), or > accessing the job control list (kill %1). > > Since both NOFORK and MAYFORK commands can be run from within the shell, they > have to scrupulously clean up after themselves. When they call xexit() and > friends (which includes things like perror_exit() and stuff like xmalloc() > that > can call it) they longjmp() back to toysh instead of exiting, which means > resources like filehandles and heap allocations and any mmap() it does may > have > to live in the GLOBALS() block, and it may need a sigatexit() handler to free > that stuff out of GLOBALS so long-running shells (or shell scripts) don't > accumulate leaked debris from builtins that exited abnormally. > > (Note: lib/lib.c has sigatexit() instead of libc's atexit() because WHEN we > longjmp() back to the shell, we need to first call our own atexit() handlers > and > then remove them from the list. The libc ones don't let you call them and > remove > them from the list libc maintains without exiting. Auditing everything for > leaks, including all the NOFORK and MAYFORK commands, is a big todo item in > the > shell work I need to dive into at some point...) > > I dunno why csplit would want MAYFORK here. A normal command can just xexit() > and let the kernel close filehandles and free memory when the process exits. I > note that 95% of the overhead of fork/exec is the exec part, not the fork > part, > so "fork and call toy_find("blah")->toy_main()" is still pretty cheap. (On > systems with an MMU, anyway. It's all copy on write. I'm aware Rich Felker > disagrees, but he's always using threads for everything, and threads have > _always_ combined badly with fork(). I suspect he's setting up some gratuitous > thread plumbing by default that he thought was free, and suddenly he noticed > he's penalized fork(), and now he's blaming fork(). But I haven't looked > deeply > into the details of what he's mad about, because I dowanna. But, you know, the > linux-kernel guys would have NOTICED if fork() was slow. As would everybody > else > everywhere.)
(i doubt it's him so much as people using musl in large programs. but the issues with fork() on large modern hardware running large modern programs are well known. https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf is a good recent summary, but USENIX has been talking about this stuff for at least 20 years. macOS implements posix_spawn() as a syscall. linux still seems to be on the clone() and close_range() path of hacks.) > > The implementation is mostly POSIX compliment, but it is missing a few > > things > > Missing stuff out of posix is pretty normal, they specify a lot of nonsense. > My > patch implementation is missing various the posix options like -b and -e, and > not only has nobody complained, but I submitted my patch implementation to > busybox in 2010 and _they_ haven't bothered to implement those options since > either. > > > It works as a Read-Eval-Print loop, where it prints to a file that changes > > based on context. So doing negative offsets would require it to print lines > > it doesn't accumulate yet. > > Yeah, grep -A -B -C does that sort of ring buffer nonsense with lines it _may_ > need depending on later stuff. It's a fiddly pain. > > > The other main one is the fact it doesn't do "[LINE] {[NUMBER]}" cleanly > > yet. > > I applied what you sent verbatim and haven't started cleaning anything up yet, > if you have more work to do I'm not actually familiar with csplit. (Never used > it, still need to come up to speed...) > > > It also includes the GNU extension "{*}" argument > > > > The other breaks from POSIX are mostly insignificant, like the fact it > > doesn't > > check locale environment variables or uses "%lu" for file size instead of > > "%d". > > Nothing in toybox checks the locale environment variables (outside of UTF-8 > enablement for the fontmetrics stuff in main.c, and we usually _set_ the > variables when we do that). > > And posix has been just plain wrong about int-vs-long printf variables since > the > general switch to 64 bits in 2005. It's coming up on 20 years since then, so > possibly Issue 8 will finally fix that? Or maybe that's just when they finally > noticed they're obsolete and the NEXT release would fix it? Wake me when they > restore "tar" and deprecate "pax"... > > Rob > _______________________________________________ > Toybox mailing list > Toybox@lists.landley.net > http://lists.landley.net/listinfo.cgi/toybox-landley.net _______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net