On Wed, Sep 9, 2020 at 11:15 PM Rob Landley <r...@landley.net> wrote:
> On 9/9/20 7:19 PM, enh via Toybox wrote: > > don't apps need libc localization? not really. the POSIX localization > > functionality is so anaemic that it's really not useful even for "major > > minority" languages. > > I try to have strerror() display the error codes (but still think it's a > missed > opportunity that the "C" locale doesn't output EPERM and friends as the > actual > strings), and keep my error message vocabulary small and simple. I also > try to > preserve and display utf8 input for usernames and filenames and such. > glibc 2.32 actually added new functions for 9 -> "KILL" and 1 -> "EINVAL". i plan on adding those to bionic too, if only so i can add the moreutils "errno" to toybox, which i've found useful at times. (one of these days i'll write a static analyzer to catch people adding new %d errno printfs to the code base. do they not know about strerror()? do they not know that errno values aren't constant across architectures?) > Beyond that, I've stayed away from internationalization up until now, and > if > your response is "kill it with fire" I can revert it. > that would be my choice, certainly. if you weren't already convinced by my examples, a couple more common ones: when you add i18n to _output_ people (not unreasonably) expect it for _input_ too, which is a nightmare you don't want to get into. also the answer to "will the kernel ever localize the content of files in /proc?" means we're not doing our actual intended users (human or machine) any favors here. it's easier to learn when things are consistently wrong (like me dealing with the victorian^WUS use of Fahrenheit and the twelve-hour clock --- since they're _consistently_ wrong i know to be on the lookout, and i can cope). [for an example of the confusion that comes from inconsistency that you're already familiar with: which way round do you write a Korean or Japanese name in English?] > > if you're serious about localization, you're going to need > > icu4c anyway, which isn't scared to embrace all the diversity that's > > actually out there (rather than the tiny subset that the POSIX folks > could > > imagine, which doesn't even stretch to the need for the genitive case in > dates, > > to pick one random fairly mainstream example). > > Nope. Not going there. > nor should you. "you are not an app", so real people never need to see you. developers and sysadmins do, often from machines in random locations/locales/timezones, and they (and their scripts) are better served by consistency. > I vaguely intend to have toysh command line editing handle right-to-left > mode > due to a completionist streak, to me that's different. that's more in the bucket of "full UTF-8 support", which is clearly a good thing. _someone_ is going to have to deal with Arabic filenames at some point (and they won't necessarily be able to read them). thanks to confusion about uppercasing/lowercasing Turkish dotted/dotless 'i's i see rather more Turkish input than you'd expect from someone who doesn't speak Turkish and has never been there. > and back when I was planning on implementing vi > by vertically stacking the line editing plumbing (hence "linestack.c") I > was > gonna make sure that did it properly too. But now there's a vi there that > I have > nothing to do with which shares no infrastructure with anything else, so I > guess > that part's not my problem anymore. > > But that's all utf8 and unicode stuff. I haven't got a clue what the > strings it > includes MEAN. > yeah, exactly. > > luckily, i've also been able to neuter Android's libc so none of this > will > > affect Android whichever way toybox goes[1]. but i still think it's a > bad idea. > > I wouldn't have volunteered to do it myself, I'm being presented with > complaints > and attempting to find the least bad way to resolve them. :) > > "This is too many digits for humans to handle" is why adding commas to > numbers > was invented. It was the obvious solution. And then somebody complained > that > using commas is parochial, so I added the periods which should cover just > well > over 90% of the planet's population. (China uses 1,000.0 about everybody. > the question for a lot of those people that you need to ask yourself is: do they group in 2s or 3s or 4s or a mix or a mix at the same time? chinese-numbering-influenced countries sometimes count in ten-thousands rather than thousands, and indian numbering can let you have something 12,34,56,789 (2s and 3s in the same number). #include <if you really care, you need icu4c because humans are really really weird> > If "consistently show megabytes for systems > X gigabytes" vs 'consitently > show > kilobytes for systems < X gigabytes" is good enough, even when the > resulting > numbers are long, I'm happy to rip the comma support back out. > personally, that's my preferred solution. (and what i _think_ current procps top is doing, though i don't have enough systems to be sure, and i'm not sure what to think about your results.) > > no "real people" should ever need to look at this, but machines and > developers > > will, and every bit of localization hurts the real audience. > > Yes and no. There's a lot of developers out there who don't speak english, > certainly not as their first language. I don't want to unnecessarily > exclude them. > they're going to have more trouble with --help output than they are here. and like i said, i'm pretty sure that "C/POSIX number formats" is something you need to learn pretty early on. no-one likes a for loop condition like x <= 70,2 after all, and the kernel's never going to localize :-) > > at least 15'936.2 would be a valid C++14 identifier (and i'm assuming > will make > > it into C2x) :-) > > That's the opposite of helping. > sorry, just winding you up. probably not the best time for it! > > ___ > > 1. strictly, the fact that you're doing your own insertion of ',' > separators > > might hurt me (in the `top -b` case), but i'll worry about that if i > notice it > > actually break any parsing. i know that's included in Android's standard > > bugreports, but i _don't_ know that anyone's parsing it. > > If the units weren't constant before then their parsing was iffy at best. > Now at > least the units should be constant on a given system. > > Rob > _______________________________________________ > Toybox mailing list > Toybox@lists.landley.net > http://lists.landley.net/listinfo.cgi/toybox-landley.net >
_______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net