Re: Q to unix filesystem developers

2011-04-14 Thread Branko Čibej
On 15.04.2011 06:20, William A. Rowe Jr. wrote: > On 4/14/2011 9:51 PM, Branko Čibej wrote: >> Woe to poor programmers who don't realize it can /also/ be the second >> byte of a multibyte Shift-JIS-encoded character. > According to what I've read on JIS, so /could/ 2F, except that none of > the JIS

Re: Q to unix filesystem developers

2011-04-14 Thread William A. Rowe Jr.
On 4/14/2011 9:51 PM, Branko Čibej wrote: > > Woe to poor programmers who don't realize it can /also/ be the second > byte of a multibyte Shift-JIS-encoded character. According to what I've read on JIS, so /could/ 2F, except that none of the JIS implementations use those code points. But their m

Re: Q to unix filesystem developers

2011-04-14 Thread Branko Čibej
On 15.04.2011 03:54, William A. Rowe Jr. wrote: > On 4/14/2011 8:47 PM, Jonathan Leffler wrote: >> The Wikipedia page for Shift-JIS (http://en.wikipedia.org/wiki/Shift_JIS) >> shows the Yen >> symbol ¥ as appearing at 0x5C, which is where the backslash appears in >> Unicode and ISO >> 8859-x code

Re: Q to unix filesystem developers

2011-04-14 Thread William A. Rowe Jr.
On 4/14/2011 8:47 PM, Jonathan Leffler wrote: > > The Wikipedia page for Shift-JIS (http://en.wikipedia.org/wiki/Shift_JIS) > shows the Yen > symbol ¥ as appearing at 0x5C, which is where the backslash appears in > Unicode and ISO > 8859-x codesets. > > It (backslash) also falls into the danger

Re: Q to unix filesystem developers

2011-04-14 Thread Jonathan Leffler
On Thu, Apr 14, 2011 at 18:27, William A. Rowe Jr. wrote: > On 4/14/2011 8:04 PM, Branko Čibej wrote: > > On 15.04.2011 01:24, William A. Rowe Jr. wrote: > >> On 4/14/2011 6:00 PM, Jonathan Leffler wrote: > >>> Given that the second byte is in the range 0x40..0x7E (second para), > and / is 0x2F, t

Re: Q to unix filesystem developers

2011-04-14 Thread William A. Rowe Jr.
On 4/14/2011 8:04 PM, Branko Čibej wrote: > On 15.04.2011 01:24, William A. Rowe Jr. wrote: >> On 4/14/2011 6:00 PM, Jonathan Leffler wrote: >>> Given that the second byte is in the range 0x40..0x7E (second para), and / >>> is 0x2F, there >>> shouldn't be a problem with Shift-JIS. That's not to s

Re: Q to unix filesystem developers

2011-04-14 Thread Branko Čibej
On 15.04.2011 01:24, William A. Rowe Jr. wrote: > On 4/14/2011 6:00 PM, Jonathan Leffler wrote: >> Given that the second byte is in the range 0x40..0x7E (second para), and / >> is 0x2F, there >> shouldn't be a problem with Shift-JIS. That's not to say there isn't >> another codeset >> where ther

Re: Q to unix filesystem developers

2011-04-14 Thread William A. Rowe Jr.
On 4/14/2011 6:00 PM, Jonathan Leffler wrote: > > Given that the second byte is in the range 0x40..0x7E (second para), and / is > 0x2F, there > shouldn't be a problem with Shift-JIS. That's not to say there isn't another > codeset > where there isn't a problem, but I don't think it is Shift-JIS

Re: Q to unix filesystem developers

2011-04-14 Thread Jonathan Leffler
On Thu, Apr 14, 2011 at 13:04, William A. Rowe Jr. wrote: > With some multibyte character sets, it may be possible that '/' is one > byte of a multibyte sequence. From a Unix perspective, I presume that > it is always treated a path separator and never treated as a multibyte > combination filenam

Re: Q to unix filesystem developers

2011-04-14 Thread William A. Rowe Jr.
On 4/14/2011 5:02 PM, Wes Garland wrote: >> Correct, utf7/8 are otherwise escaped. > > It's stricter than that... FWIW - I wrote the apr utf8 functions a decade ago. It's really irrelevant to my underlying question ;-) It turns out '/' is the value 63 in utf-7, although we can presume there are

Re: Q to unix filesystem developers

2011-04-14 Thread Wes Garland
> Correct, utf7/8 are otherwise escaped. It's stricter than that, at least for UTF 8,16 and 32 (I haven't checked 7) -- they don't use values < 0x80 at all except when representing characters which are the same in 7-bit ASCII. This means, given any of the encodings { ASCII, ISO-8859-x, UTF-{8/16/

Re: Q to unix filesystem developers

2011-04-14 Thread William A. Rowe Jr.
On 4/14/2011 3:34 PM, Wes Garland wrote: > On Thu, Apr 14, 2011 at 4:04 PM, William A. Rowe Jr. > wrote: > > With some multibyte character sets, it may be possible that '/' is one > byte of a multibyte sequence. From a Unix perspective, I presume that > it

Re: Q to unix filesystem developers

2011-04-14 Thread Wes Garland
On Thu, Apr 14, 2011 at 4:04 PM, William A. Rowe Jr. wrote: > With some multibyte character sets, it may be possible that '/' is one > byte of a multibyte sequence. From a Unix perspective, I presume that > it is always treated a path separator and never treated as a multibyte > combination filen

Q to unix filesystem developers

2011-04-14 Thread William A. Rowe Jr.
With some multibyte character sets, it may be possible that '/' is one byte of a multibyte sequence. From a Unix perspective, I presume that it is always treated a path separator and never treated as a multibyte combination filename character. But I just wanted to ask in case anyone is aware of w