Re: [sword-devel] better UTF-sensitive sort

2016-01-13 Thread David Haslam
Aside: FIO. I since found out that Notepad++ doesn't support UTF-16 but rather UCS-2. Hence no support beyond the Unicode BMP. David -- View this message in context: http://sword-dev.350566.n4.nabble.com/better-UTF-sensitive-sort-tp4655731p4655741.html Sent from the SWORD Dev mailing list

Re: [sword-devel] better UTF-sensitive sort

2016-01-13 Thread Matěj Cepl
On 2016-01-12, 16:52 GMT, DM Smith wrote: > You can take the second column and sort it by each of the > locales mentioned. https://mcepl.fedorapeople.org/tmp/sort-complicated.txt is the second column as a simple plain text in UTF8. My colleague working on LibreOffice claims that he doesn’t

Re: [sword-devel] better UTF-sensitive sort

2016-01-13 Thread Matěj Cepl
On 2016-01-13, 09:46 GMT, Matěj Cepl wrote: > My colleague working on LibreOffice claims that he doesn’t know > about anything better than ICU. Yes, it is a monster. Perhaps > UTF-8->UTF-16LE->UTF-8 round-trip is not that expensive after > all? Besides, don't we have ICU already as dependency

Re: [sword-devel] better UTF-sensitive sort

2016-01-13 Thread Karl Kleinpaste
On 01/12/2016 11:32 AM, DM Smith wrote: > Is ICU4C out of the question? Thanx for the pointer. It took a bit more contemplation than it probably should have, but I used ucol_strcollUTF8() (in icu-i18n) and it seems fine. ___ sword-devel mailing list:

Re: [sword-devel] better UTF-sensitive sort

2016-01-12 Thread David Haslam
Thanks DM. Before tackling sorts of any kind, I first used three different programs to convert the UTF-8 to UTF-16LE. Although this is away from where Karl wishes to go, I still thought it would be interesting. BabelPad and TextPipe gave identical results which is a positive. Notepad++ didn't

Re: [sword-devel] better UTF-sensitive sort

2016-01-12 Thread David Haslam
FIO. Screenshot of the BabelPad sort dialog: https://www.dropbox.com/s/hedexkg6wc3fnhi/Screenshot%202016-01-12%2019.38.23.png?dl=0 David -- View this message in context: http://sword-dev.350566.n4.nabble.com/better-UTF-sensitive-sort-tp4655731p4655738.html Sent from the SWORD Dev mailing

Re: [sword-devel] better UTF-sensitive sort

2016-01-12 Thread David Haslam
Hi Karl, Windows is largely based on UTF-16, even though many applications can handle UTF-8 internally. Have you timed a round trip conversion of the language names from UTF-8 to UTF-16 and back again? Is it really so slow that it would be noticeable to users? Are you looking for a locale

[sword-devel] better UTF-sensitive sort

2016-01-12 Thread Karl Kleinpaste
To produce Xiphos' module trees (sidebar, mod.mgr, adv.search), I sort by language using qsort+strcmp. This was recently pointed out as being poor for UTF-8 strings, and I replaced strcmp with strcoll. This works fine in Linux. Unfortunately, the Win32 version of strcoll believes in UTF-16, even

Re: [sword-devel] better UTF-sensitive sort

2016-01-12 Thread DM Smith
For a localized list of language names see our wiki: http://www.crosswire.org/wiki/Localized_Language_Names You can take the second column and sort it by each of the locales mentioned. Example of the complexity: It used to be the

Re: [sword-devel] better UTF-sensitive sort

2016-01-12 Thread DM Smith
Is ICU4C out of the question? It has support for collation. See: http://site.icu-project.org/design/collation/v2 > On Jan 12, 2016, at 11:12 AM, Karl Kleinpaste wrote: > > To produce Xiphos' module trees (sidebar, mod.mgr,