On Sun, May 22, 2016 at 11:23 AM, Rob Landley <r...@landley.net> wrote: > On 05/11/2016 01:41 AM, Andy Chu wrote: >>> Oh I was quite impressed with Lua, but all programming languages operate >>> within a framework and Lua intentionally doesn't provide a usable >>> standard framework. >> >> The way I think of it is that Lua doesn't provide the program with any >> "capabilities" by default (in the security sense). You have to >> explicitly grant capabilities by providing hooks to your application. > > Providing write() but not printf(), or + - * / but no math library with > trig functions, has nothing to do with security. > > The X11 problem was always "Here's a window and line drawing primitives. > Creating a toolkit for buttons and sliders and pulldown menus and such > is left as an exercise to the user, there's no standard one provided and > 12 non-standard ones which all suck". > > Hence qt vs gtk. They don't let you do anything you couldn't without > them, they just save you writing giant piles of code yourself. > >> This is actually one of the things that attracted me to it, since >> having a secure environment opens up some interesting possibilities >> with executing remote code (like JavaScript). > > The most secure system is powered off, ground into a fine powder, mixed > with acid, encased in concrete, and dropped into a deep sea trench. > Ideally in a way that the acid will eat through the concrete and > dissolve the whole mess into the ocean near the bottom. (And that's > assuming you haven't got the budget to fire it into the sun and closely > monitor its entire trip there.) > >> Tcl has a similar embedded language design philosophy, but it happened >> to come with GUI libraries and such which made it popular for awhile. >> >> I don't think Lua "refused" to provide a standard library... people >> were mostly using it for games and embedded applications, and there >> just wasn't a strong enough community running it on POSIX or whatever. >> >> It was just 1 or 2 academics who wrote all the code -- they never had >> a public source repo or accepted patches. > > I was under the impression it had a vigorous community doing stuff for a > decade before anybody who spoke English noticed, because they were doing > it in portugese. > > Practical result's the same either way. > >>>> busybox awk looks like a pretty straightforward interpreter >>>> architecture from what I can tell -- lex, parse, walk a tree to >>>> execute, and runtime support with hash tables and so forth. >>> >>> Possibly awk and sh can share parser infrastructure. Not sure yet. >> >> One thing to note is that they use opposite parsing algorithms: >> >> * sh: All implementations except bash use a hand-written recursive >> descent parser, i.e. top down parsing; whereas bash uses yacc, i.e. >> bottom up parsing. And bash regrets the choice. > > I wasn't planning to use yacc. > >> * awk: All implementations except busybox awk use yacc (bottom up). > > I wasn't planning to use yacc here either. > >> It's not entirely clear to me what algorithm busybox awk is using; I >> think it is a hand-written bottom up parser. Doesn't look like >> recursive descent for sure. > > My limiting factor with awk is I need to collect a large corpus of awk > test scripts so I know what success looks like.
(coincidentally i was joking to someone last week that you could probably replace awk with a binary that only understood "{ print $2; }" and it would be years before anyone would notice. it's been a long time since i saw anything else in real life...) >> The difference arises from the language itself. The main sh language >> has no expressions and hence no left recursion; it's essentially LL(1) >> (except for looking ahead to find the ( in a function def). > > You can recurse, you can throw stuff on a stack. Not a big deal either way. > > No man page for ll or LL. When I type "ll" Ubuntu has it as an alias for > ls -l (so no prompt for a package to install). And LL says command not > found (again, no prompt for a package to install). (https://en.wikipedia.org/wiki/LL_grammar) >> Awk has TWO expression languages -- the conditions can be combined >> with boolean logic (e.g. $1 == "foo" && $2 == "bar), and the >> procedural action language has arithmetic. So bottom up parsing works >> better here. > > Don't care. > >>> What is and isn't a bug is... It took me a while to figure out why this >>> works: >>> >>> for i in a b c; do echo $i; done >>> >>> But this is a syntax error even though I can put a newline after the do: >>> >>> for i in a b c; do; echo $i; done >> >> The shell syntax is definitely weird at first, but this distinction >> follows directly from the POSIX grammar -- which I mentioned is >> accurate in the sense that all the implementations I tested are very >> conformant. (The exception is bash which doesn't allow unbraced >> single command function definitions. Try "func() ls /; func" in bash >> and dash; according to the grammar, dash is correct.) >> >> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_10 > > No. > > Any time "bash is wrong but dash is correct", posix is wrong. Posix is > saying that the de-facto Linux shell got this wrong for almost 20 years > and nobody noticed, then a shell that I could trivially segfault when > Ubuntu first swapped /bin/sh for it, and which "sleep 100 &" and then > ctrl-c at the prompt would kill the backgrounded sleep... That was doing > it "right". > > No. No it wasn't. Posix was at _best_ irrelevant. > >> The relevant productions are: >> >> ... For name linebreak in wordlist sequential_sep do_group >> >> do_group : Do compound_list Done /* Apply rule 6 */ >> >> compound_list : term >> | newline_list term >> | term separator >> | newline_list term separator >> >> >> compound_list can start with a list of newlines, but it can't start >> with semicolons. That's why you can have newlines after "do" but not >> a semicolon. > > I think I'm allowing the semicolon in mine, because there's no obvious > reason not to. > >>> 1) is basically ( as a command (it's a context shift command like if or >>> while, but it's a command, same block definition as above; see also { >>> and } blocks). >>> >>> 2) happens during environment variable parsing (the _fun_ bit is the >>> quoting in "$(echo "$(ls -l)")") >> >> In my parser, there's nothing special about a command sub surrounded >> by double quotes surrounded by a command sub surrounded by double >> quotes. That's all handled straightforwardly by the recursion (ditto >> for evaluating the expression). However, detecting the ) that matches >> a command sub is not so straightforward, since there are 4 uses of ). >> It does involve a stack in the lexer; it's debatable whether "context >> stack" describes it. >> >>> Oh, speaking of { } blocks, you can do this on the command line: >>> >>> { echo -e "one\ntwo\nthree" >>> } | tac >>> >>> But if you don't have the line break in there the } is considered an >>> argument to echo and you get a prompt for continuation until you feed it >>> } on the start of a line. You can use a ; instead of a newline though, >>> that's "start of a line" enough. >> >> Right this is because { and } are "reserved words", while ( and ) are >> operators. A reserved word has to be delimited by space, whereas an >> operator delimits itself. Reserved words are only special if they are >> the FIRST word, so echo } doesn't need to be quoted, but echo ) does. > > I know. > >> (echo hi) # valid without spaces >> {echo hi} # not what you think >> { echo hi } # not what you think either >> { echo hi; } # correct because ; is an operator, and } is the first >> word in the next command > > You're explaining back at me what I said. > >>>> There is a similar problem with ${} overloading -- >>>> it's used for anonymous blocks and brace expansion, in addition to var >>>> expansion. I found bash bugs here too. >>> >>> Such as...? >> >> The test case I came up with is: >> >> $ echo ${foo:-$({ which ls; })} >> -bash: syntax error near unexpected token `)' >> >> $ dash >> $ echo ${foo:-$({ which ls; })} >> /bin/ls > > You said they said they regret using yacc as their parser. :) > >> This is a command sub with a braced block inside it, as the default >> value inside ${}. Bash gets confused about the matching }. Something >> like ${foo:-${bar}} should work fine though. > > I just checked at echo ${blah:-"$({ ls; })"} works, which isn't hugely > surprising. > >>> Context stack? That was my way. Lots of this parsing needs to nest >>> arbitrarily deep, and it can cross lines: >>> >>> $ echo ${hello:- >>> > there} >>> there >> >> Right, this is the PS2 problem. When you hit enter, do you execute >> the command, or print > and continue parsing? > > Eh, not that big a deal. My question was more whether > > $ ls ; echo ${hello:- > > Should run the ls before prompting for the rest of the echo. > >> Actually this case is broken in dash -- try "echo ${ <newline>" in >> bash and dash. (Although I'm sure nobody really cares.) > > I don't really care what dash does. It is defective and annoying, says > so right in the acronym. > >>> And if you put a double quote before the $ and after the } you get a >>> newline before there. If you don't, command line argument parsing and >>> reblocking strips it. >>> >>> What do I mean by reblocking? I mean this: >>> >>> $ printf "one %s three %s\n" ${hello:-two four} >>> one two three four >> >> I don't see anything special about this; it's a straightforward >> consequence of word splitting. > > Is that what the standard calls it? It's been years since I read through > the thing from start to finish, terminology gets a bit fuzzy. > >> Because there are no quotes around >> ${hello...}, its value is subject to word splitting, so there are two >> arguments to printf. > > Yes, I know why it does it. > >> Quotes change the behavior as you would expect; > > You keep thinking I would expect things, but "$@" > >> now there is one argument to printf: >> >> printf "one %s three %s\n" "${hello:-two four}" >> one two four three >> >> (with the last %s expanding to empty) >> >>>> The bash aosabook chapter which I've referred to several times talks >>>> about how they had to duplicate a lot of the parser to handle this, >>>> and it still isn't right: >>> >>> I'm not looking at bash's implementation, I'm looking at the spec and >>> what it does when I feed it lots of test cases (what inputs produce what >>> outputs). >> >> You apparently have a love-hate relationship with bash. > > It's GNU code widely used by Linux. So yeah. > >> You explicitly said you want to write bash and not just sh, yet you don't >> want to look at how it implements anything :) > > I never look at FSF code. On general princples. But the behavior of the > standard Linux command line is what Linux developers (and the build > systems they write) expect. > >>> Years ago I was trying to get it to preserve NUL bytes in the output of >> >>> Toybox doesn't use libc getopt(), we use lib/args.c (which does not use >>> libc getopt), so what you decide to do in your shell and what it makes >>> sense for toysh to do may not be related to each other here. >> >> Sure, I'm just describing what it does. I agree getopts is an awkward >> interface in sh, but if you want a POSIX shell, much less a bash >> clone, you need it. > > Yeah but I might be able to use lib/args.c syntax instead of getopt > syntax, since my stuff is mostly a superset of their stuff. Haven't dug > into that todo item yet. Not hugely worried about it either way. > >>> Keep in mind, over the years people have written a dozen different >>> shells. It's really not that big a deal, I just want to do it _right_ so >>> I'm trying to reserve a large block of time so that once I start I can >>> finish rather than getting repeatedly interrupted. And that means >>> knocking down a bunch of smaller todo items first. >> >> I definitely agree that you want a big block of uninterrupted time. >> (I've been off work since March so I've got that going for me.) >> >> It's not clear to me that any reasonably popular shell was started >> later than 1990 or so (is zsh the latest?). I think the BSDs are >> using code started 40+ years ago. I don't know when mksh is from, but >> I think it must be that old too. > > This is why I want a bash replacement. Large existing userbase should be > able to move over as painlessly as possible. I'm not trying to invent > significant new syntax here. > > A shell is fairly central to the idea of unix, and the default shell of > Linux has always been bash. (Ubuntu's insanity notwithstanding: the way > ubuntu admitted its mistake was to make /bin/bash the default _login_ > shell, so it was in all the /etc/passwd entries despite #!/bin/sh > pointing to something political and useless.) > >> As I mentioned, my goal isn't to simply implement sh, because that's >> been done. It seems to me that 25 years is a good interval to have >> some innovation in the shell. I'm just starting with sh so it's a >> superset of what is known to work, and so people actually have a >> migration path. > > The same way C is decades old therefore Objective C and C++ and so on > _must_ be an improvement? > > I have seen lots, and lots, and LOTS of new languages fork off of > existing stuff over the years. Back when I was on fidonet in the 90's > somebody had collected a list of TWO THOUSAND programming languages, > which seemed kind of excessive. (I don't still have this list and it > would be 20+ years out of date anyway, but I remember there was more > than one language named "oberon".) > > At $DAYJOB one of the programmers wrote an openoffice spreadsheet to > VHDL translation layer in something called leingen, which is a dialect > of scheme (which is a dialect of lithp) using java virtual machine > features. This did not seem advisable to me, and yet it exists and > nobody's had time to rewrite it yet. > > Good luck with your project, it is a can of worms I have _zero_ interest in. > > Rob > _______________________________________________ > Toybox mailing list > Toybox@lists.landley.net > http://lists.landley.net/listinfo.cgi/toybox-landley.net -- Elliott Hughes - http://who/enh - http://jessies.org/~enh/ Android native code/tools questions? Mail me/drop by/add me as a reviewer. _______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net