On 24/06/09 14:00, björn wrote:
>
> Hi Eljay,
>
> 2009/6/23 John (Eljay) Love-Jensen:
>>
>>> As far as I can tell (from searching around) HFS+ always uses
>>> normalization form D (NFD) for filenames.
>>
>> HFS+ uses a variant of NFD for filenames.  (The HFS+ variant predates
>> standardizatoin of NFD.)  This requirement is enforced by the OS.
>>
>> http://developer.apple.com/technotes/tn/tn1150.html
>> http://developer.apple.com/technotes/tn/tn1150table.html
>> http://developer.apple.com/qa/qa2001/qa1235.html
>> http://www.unicode.org/reports/tr15/
>
> Thanks for clarifying that (and for the links!).
>
>> Windows uses NFC for filenames.  I'm not sure if the Linux world settled on
>> NFC or NFK.
>
> I read that Windows uses NFKC.  Have you got a reference for the claim
> that NFC is used?
>
>>> So as a workaround for the issue the OP had I now normalize filenames
>>> to compatibility form C (NFKC) before passing the filename on to Vim
>>> and this takes care of the OP's problem.
>>
>> NFC or NFKC?  Those are different normalizations.
>>
>> Windows NTFS file system uses NFC.  But it isn't enforced by the OS, yet.
>
> I did mean the compatibility form NFKC since I read somewhere that
> NTFS uses NFKC, but I did not research that very carefully.
>
>
>>> However, as I see it this really is a legitimate issue in Vim itself
>>> in that it does not handle NFD properly (the example above should
>>> always render as one glyph, not three as it does now if NFD is used).
>>> Either Vim should ensure that all buffers are normalized to composed
>>> form NFC/NFKC or it needs to be made "NFD aware".
>>
>> I agree with your assessment.
>>
>>> Does anybody on the vim_multibyte list (this mail goes to vim_mac as
>>> well) have any comments on this?
>>
>> The relevant Mac OS X routine APIs are:
>>
>> CFURLRef url =
>> CFURLCreateWithFileSystemPath(
>>   kCFAllocatorDefault,
>>   cfstringFullPath,
>>   kCFURLPOSIXPathStyle,
>>   false));
>>
>> char bufferUTF8[32768*4]; // Worst case scenario.
>> // As per Apple documentation, paths can be "up to 30,000 UTF-16
>> // encoding units long", with each component being up to 255 UTF-16
>> // encoding units long.  Too bad there isn't an API to specify the
>> // exact buffer size /a priori/.
>>
>> Boolean success =
>> CFURLGetFileSystemRepresentation(
>>   url,
>>   true,
>>   &bufferUTF8[0],
>>   sizeof bufferUTF8);
>
> Thanks.  NSString has a method called fileSystemRepresentation which
> I'm guessing does the same thing(?).  I used the NSString method
> precomposedStringWithCompatibilityMapping to convert to NFKC.
>
> Björn

Hm, NFKC and NFKD sometimes fuse slightly different glyphs into a single 
"normalized" form. For instance, NFKC(²) = 2, though both are 
(different) Latin1 characters (0xB2 and 0x32). IIRC, DOS would have kept 
them distinct.

Best regards,
Tony.
-- 
hundred-and-one symptoms of being an internet addict:
56. You leave the modem speaker on after connecting because you think it
     sounds like the ocean wind...the perfect soundtrack for "surfing 
the net".

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_mac" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply via email to