Re: encoding of file names

2011-05-27 Thread Quincey Morris
On May 26, 2011, at 22:56, Andrew Thompson wrote: I believe this stems from a period in history when the unicode group believed that they'd be able to fit all practical scripts into 65536 code points. Which meant you could get away with all kinds of assumptions like 16 bit types and UCS-2.

Re: encoding of file names

2011-05-26 Thread Andrew Thompson
However, in practical terms, the indexable string elements are components, not codepoints. It seems to me the single hardest thing to come to grips with when newly approaching NSString is understanding that 'unichar's (and characters in the sense of [characterAtIndex:]) *aren't*

Re: encoding of file names

2011-05-25 Thread Quincey Morris
On May 24, 2011, at 22:12, Ken Thomases wrote: Also, I wouldn't say that codepoints may each consist of a variable number of components. They may be _encoded_ to a variable number of components, but don't consist of them. OK. This make absolutely no sense unless the word character is

Re: encoding of file names

2011-05-24 Thread Ken Thomases
On May 24, 2011, at 8:58 AM, John Joyce wrote: Sorry, this is a bit of a tangent on the topic... When you say canonical form here, is that the same as decomposed form? I meant Apple's file-system-specific canonical form, which is a variant of Normalized Form D, which is decomposed. Also, do

Re: encoding of file names

2011-05-24 Thread Quincey Morris
On May 24, 2011, at 17:33, Ken Thomases wrote: I am sure this becomes more difficult with Arabic, Hebrew and Thai and other writing systems that have highly composed forms. (not sure if that's the right term) Not really. There *is* another level, described briefly here:

Re: encoding of file names

2011-05-24 Thread Ken Thomases
On May 24, 2011, at 11:09 PM, Quincey Morris wrote: On May 24, 2011, at 17:33, Ken Thomases wrote: I am sure this becomes more difficult with Arabic, Hebrew and Thai and other writing systems that have highly composed forms. (not sure if that's the right term) Not really. There

encoding of file names

2011-05-23 Thread Chris Idou
If I take a string from an NSTextField with an accented character: café and I make this into a file name and write a file, then I read that file name back in (using NSFileManager contentsOfDirectoryAtPath), then the string read back in, still looks the same: an accented café, but the strings

Re: encoding of file names

2011-05-23 Thread Howard Siegel
Look at NSString's decomposedStringWithCanonicalMapping and decomposedStringWithCompatibilityMapping methods. They'll map Unicode strings to normalized forms that you can then use and compare. - h On Mon, May 23, 2011 at 21:22, Chris Idou idou...@yahoo.com wrote: If I take a string from an

Re: encoding of file names

2011-05-23 Thread Aki Inoue
-[NSString compare:] and its variants can handle the normalization properly. Aki Inoue On 2011/05/23, at 21:41, Howard Siegel hsie...@gmail.com wrote: Look at NSString's decomposedStringWithCanonicalMapping and decomposedStringWithCompatibilityMapping methods. They'll map Unicode strings

Re: encoding of file names

2011-05-23 Thread Ken Thomases
On May 23, 2011, at 11:22 PM, Chris Idou wrote: If I take a string from an NSTextField with an accented character: café and I make this into a file name and write a file, then I read that file name back in (using NSFileManager contentsOfDirectoryAtPath), then the string read back in,