Date: Fri, 25 Jan 2019 14:04:07 +0300 From: Valery Ushakov <u...@stderr.spb.ru> Message-ID: <20190125110407.ge18...@pony.stderr.spb.ru>
| I don't understand why the locale support in that | particular place is not ripped out immediately when discovered. Because it has been there a very long time, and no-one has complained about it, and I have no idea to what extent it might be being actively used. Do you? Also because it was quite clearly done deliberately ... there was a PR, in 1997 (PR#3914), which requested support for non-integral numbers of seconds. The PR supplied code to implement it, which (aside from ugly formatting) looks as if it would have been just fine, and which most certainly handled only '.' as the radix char. The code supplied apparently came from OpenBSD (whether the submitter of the PR wrote it for OpenBSD, or just took it from there is not clear, but I think probably the former.) The functionality was implemented, but with totally different code, precisely to make locale specific input work (all of this is in the CVS logs, the PR, and the comments in the code). While I was using NetBSD (to an extent) at that time, I certainly was not paying any attention to changes like that. But the time to complain about it would have been then, or soon after. | If the problem described in the original report is not a gross and | cynical violation of POLA I don't know what is. No, I agree, which is why I made the change that makes it always attempt to use the C locale if there is a parse error converting the striing (if you just say "2" it makes no difference...) The way it is done now is not very nice, and I do plan on changing that, the new version is much nicer ... but the functionality is unaltered. | I have posted about this to the original thread on netbsd-users@ when | the issue came up, before any changes were made. That I did not see ... but netbsd-users wouldn't be the correct forum either. I'd suggest either current-users or tech-userlevel, with a Subject that makes it obvious that relevant concerned people should take a look - not a reply to some other random message. I don't much care about the outcome, as you have suggested, locales don't have a lot of influence on me, even though I do live in a non-ascii country (just as non-ascii as yours, perhaps moreso) and I certainly make no claims to knowing anything much about locales - I would never have deliberately added code to make sleep handle non-C locales, but it could have easily happened by accident. I mean, who'd guess that strtod() would parse numbers in a locale specific way? (Sleep did not use that, it used atof(), but atof() is just strtod() with error checking turned off....) How many other utilities do we have that have the same issue? | The other mail that made it to the list was about an openwindows | program in sunos (mail? i don't remember) that accidentally generated | PostScript with locale specific floating point numbers. As you can | imagine PS interpreter didn't know how to interpret 0,1 0,1 rmoveto That message I did see. It does not qualify as anything which would prompt a reasoned discussion on how sleep should work. What's more, parsing input can allow more flexibility than is possible in generating output, for input we can allow either, for output the code needs to select one or the other. As a semi-irrelevant aside, did you know that sleep can also accept is "seconds" argument in hex ("sleep 0xA" is the same as "sleep 10")? Unlike fractional seconds, that one is not documented -- but there is (and has been for years) an ATF test to make sure that it continues to work. This is all just fallout from the use of atof() (aka strtod()). Those parse hex, so sleep does as well... But someone thought it was important enough to actually validate that it works correctly. (I added a couple more tests, testing fractional hex input, the other day, but the orignal hex test has been there for years.) | This sleep fiasco is up there with that story. Fiasco? What fiasco? The original report was about a user seeing annoying messages when he restarted some rc.d daemon. Aside from the messages, everything was working - though perhaps the sleep loop went around a few more times than anticipated - depends upon whether or not printing the error message took more than 50ms. This was one of the more innocuous issues imaginable really, no harm was done. Just an (initially unexplained) irritation. Hardly a fiasco! What's more, the only time it would ever have actually "failed", was with a sleep duration < 1s (that is: sleep 0.x for some value of x). Any other usage (eg: sleep 2.5) would have (seemed to) work just fine (ie: no annoying message, and no immediate exit from sleep) whatever the locale. And last (and not really related) ... | I try to avoid locale with the exception of LC_CTYPE. Then maybe you can help ... in sh, I have implemented (a year or so ago, I forget when) the coming new POSIX quoting format $'...' (which of course came from some other shell originally, and is now implemented by just about all shells I believe) which is supposed to implement something approximating C "" strings (with all the \ escape sequences in those, plus a few more) but which is otherwise identical to sh '...' quoting. One of the (not sure if this is in C or not, but I suspect it is) escape sequences (well, 2 of them) is \uXXXX (the other is \Uxxxxxxxx - which are the same except for the number of hex digits that follow the 'u' (up to 4) or 'U' (up to 8)). The intent is to allow the script (the user) to enter any unicode code point, reliably, into the sh input (as an arg for a command, to assign to a variable, or anything else an arbitrary string can be used for in sh.) All of this so far is easy. The current implementation handles that, and generates a UTF-8 string into the word that is being produced, where the UTF-8 string is the encoding of the code point. That's easy too, and I believe it works. But it is wrong. What should be generated is the byte sequence that represents the code point identified, in the user's current locale (LC_CTYPE) (or, I think, '?' or something if there is no way to achieve that.) Or course, if the locale uses UTF-8 encoding, as most do, or can, all is fine, or should be (as far as ignorant ivory tower dwelling me can tell) I have no idea how to make the correct output happen, coding locales (as you quite correctly suggested) is not my thing. I assume some iconv() invocation or something is needed. So I just punted... (the man page says so!) If you'd like to help and supply the missing code piece, I can point you to exactly where in the sh sources it should be added. Same offer to anyone else who could help... There are other places where sh could really do with some LC_CTYPE locale type handling done properly, as well (currently it has essentially none.) Pattern matching (glob '*' type patterns, for file names, case statements, and substring matching (${var%string} etc) should all work with non-ascii chars. They don't currently. There is more (but I doubt that I cat even guess at the extent of what really ought to be done.) And all that is apart from the simpler (I think) issue of generating (appropriate) messages (error messages mostly, sh does not say much else) in the user's language. (Simpler conceptually, I think, but plenty of translation work needed.) kre