Date: Sat, 26 Jan 2019 21:49:51 +0100 From: Joerg Sonnenberger <jo...@bec.de> Message-ID: <20190126204951.ga7...@britannica.bec.de>
| No, the fragile refers to the problem that many locales use both "." and | "," in numbers. Yes, like English... I wasn't previously aware that '.' was ever used as the grouping char, though I did believe that some locales use a space for that purpose. | While the standards decided in their infinite wisdom | that grouping characters shouldn't be parsed in floating point context, | it is very confusing at least for casual users. "sleep 1.000" would be | perfectly sensible for a German user, but certainly not do what is | expected. Do you have a solution for that which can actually be implemented? While the original code I wrote to deal with the (not-PR'd) bug report - the one you commented on originally from a week ago - was fragile in this area, the current one is slightly less so I think. If the arg can be parsed by strtod() in the user's locale, it will be. If it cannot, but if it can be parsed in the C locale, it will be handled that way instead. If neither work it is an error. This keeps the traditional behaviour of NetBSD sleep (with some error checking added, anticipating dholland's PR 53910 before he submitted it) while also allowing scripts to sanely (well, as sanely as them ever using non-integral inputs is) use standard C floats as the input (as, as you point out, a floating number is, rightly or wrongly, currently only parsed as a string of digits and an optional single radix character, no grouping chars allowed.) [Aside: hex counts as "digits".] The question of whether sleep (and perhaps other commands) ought to parse their args in a locale specific manner is a different issue, and one worthy of considerable discussion. I have no strong opinion on this as (as has been pointed out) it does not really affect me much. | Arguing about locale behavior based on OpenBSD doesn't work, | since they intentionally doesn't implement most of it anyway. I do not think that was even the intent. In order to determine the "how should we parse the args" question, Christos suggested looking at how other systems do it - which is valid data to have. If we collectively come up with some particularly compelling reason that we should do it one way of the other, that we mostly agree on, then what other systems do is largely irrelevant. If there is no particularly good reason to prefer one way or the other, or we cannot agree, some developers/users prefer one, and others prefer the other, then at least acting the same way as (most) other systems (which allow non-integer args) might be enough to decide one way or the other. | Let's take a step back from the implementation details. I consider the | command line interface of a program part of the shell language universe. | Programming languages shouldn't change arbitrary based on locale | settings. Otherwise you get the VBA madness. That's very different from | the data being processed or messages used for interacting with the user. | Valery mentioned the Postscript example already. That;s all a valid point, which really should be made in some discussion in messages on a better list than source-changes-d (in messages with the Subject header "Re: CVS commit: src/bin/sleep"). This is not where someone from the fututure would expect to find a discussion on a philisophical (or technical) reasons why we should decide one way or the other, should they be wondering why we made the decision now whichever way we end up making it, if 20 years (or more) into the future this all comes up again. But since this is here, one point I'd make is that there is no particular distinguishing feature in the command line interface of a program, which distinguishes it from data used with interacting with the user. The program does not know from where its args were obtained. If they're written in a script, then I absolutely agree with you, it ought to use the "standard" notation (one way or another). But I know that I frequently simply type sleep n into my shell, and then follow that by a bunch of commands I want executed a little later, and while it would be unusual for me to give fractional seconds in such a case, if I did, I'd normally expect to enter those in the same format I use for any other floating point number in my day to day life, which is what I would have LC_NUMERIC set up to produce and consume (that is, whatever I believe is best for me, which is not necessarily the same as the guy at the next desk in the same room ... obviously in the same country.) Similarly, if a script requests a delay value from the user, as in the printf 'How long should the delay be? ' read delay || exit 1 sleep "$delay" type example, which is that? Data obtained while interacting with the user, or the command line interface of a program? This gets a bit messy, as while if we returned "sleep" to be the way it was 2 weeks ago, only parsing using the user's locale, scripts could handle that by simply doing sleep $(printf %g 1.234) (assuming a correctly working printf, which we now have in /usr/bin but not in the version built into /bin/sh (or csh I think) which (for sh) is something I am working on. I now kind of understand the point of some locale related code in the FreeBSD sh that I had been ignoring as "outside my area of expertise"). However, if we decide (which would be the more likely decision I think) that sleep should only accept C locale floats, and never parse using LC_NUMERIC from the environment, then what change can we make to the 3 line printf/read/sleep sequence above to make that work as expected? As best as I can tell, there is no really good way of converting numbers from one locale to another, except for the special case where the input locale is C/POSIX. In fact, I am having a very hard time thinking of any way that does not involve writing new code, but perhaps you, or one of the other people who deal with locale issues all the time (which is not me) has a solution to this? kre