Re: Software support costs (was: Nicest UTF)

2004-12-10 Thread Theodore H. Smith
just perfectly. -- Theodore H. Smith - Software Developer - www.elfdata.com/plugin/ Industrial strength string processing code, made easy. (If you believe that's an oxymoron, see for yourself.)

If only MS Word was coded this well (was Re: Nicest UTF)

2004-12-07 Thread Theodore H. Smith
-check 3MB of pure Unicode text (no style junk bloating up the file-size) in one second, for you. (The words would all be spelt correctly though, so as to not require expensive RAM copying when doing the replacements.) Yes, I do know how to code ;o) Too bad so few others do. -- Theodore H. S

Nicest UTF

2004-12-01 Thread Theodore H. Smith
pression scheme, the Unicode text would be even smaller than UTF16. SCSU (or whatever its called) can be processed as markup (XML for example) with no decompression, so its quite handy. Anyhow, thats why I think UTF-8 is really the way to go. Its too bad MicroSoft and Apple didn't realise the s

Re: Opinions on this Java URL?

2004-11-12 Thread Theodore H. Smith
ithout breaking a lot of existing stuff, something that the Java team strive to avoid. Theodore H. Smith wrote: http://java.sun.com/j2se/1.5.0/docs/api/java/io/ DataInput.html#modified-utf-8 If only people could sue for suggesting bad coding practices ;o) -- Theodore H. Smith - Software Developer. http://www.elfdata.com

Opinions on this Java URL?

2004-11-12 Thread Theodore H. Smith
http://java.sun.com/j2se/1.5.0/docs/api/java/io/ DataInput.html#modified-utf-8 If only people could sue for suggesting bad coding practices ;o) -- Theodore H. Smith - Software Developer. http://www.elfdata.com

Windows Latin1?

2004-11-06 Thread Theodore H. Smith
Are the Unicode Code Points from +U0 to +UFF, equivalent to the Windows Latin 1 code points? Or is it equivalent to ISO-Latin-1? Sorry for asking this, I know its answered somewhere, but I can't seem to find the answer on your website. -- Theodore H. Smith - Software Developer.

Re: UTF-8 stress test file?

2004-10-11 Thread Theodore H. Smith
ere impossible bytes that simply aren't in the file? - the file mixes UTF-8 and UTF-16 Does this file mix UTF-8 and UTF-16? I thought it just had surrogates encoded into UTF-8? Of course a surrogate should never exist in UTF-8. -- Theodore H. Smith - Software Developer. http://www.elfdata.com

Re: UTF-8 stress test file?

2004-10-10 Thread Theodore H. Smith
editor, delete all the non test lines, and then separate out the good and the bad UTF8 into different files! That way I can use readline type code to do my UTF-8 verification. It would be nice if someone had a "automated test ready" UTF-8 file. If not, I'll modify this one and the

Re: UTF-8 stress test

2004-10-10 Thread Theodore H. Smith
ytes. As long as we don't pass any invalid UTF-8 to client apps/code, and we don't process any invalid UTF-8, we are fine, so modifying the bytes of the UTF8 text before doing anything with it, can in some circumstances work. -- Theodore H. Smith - Software Developer. http://www.elfdata.com

Exclamation mark comma

2003-11-26 Thread Theodore H. Smith
I've often wanted to type a symbol, that's like an exclamation mark, and a comma at the same time. That is, instead of the "." on the bottom of a "!", it has a "," instead. Is there such a Unicode code point? Just out of curiosity! Or I suppose, is there a way to compose such a character.

Thanks for the answers on dictionaries

2003-11-21 Thread Theodore H. Smith
in a place where other people may learn about the TheoTrie. -- Theodore H. Smith - Macintosh Consultant / Contractor. My website:

Digest doesn't display unicode properly?

2003-11-21 Thread Theodore H. Smith
I see a lot of garbage characters in the unicode digest. The same emails, however, display fine when emailed to me directly, (although I can't understand them sometimes ;o) but someone who speaks the correct language would). Is this a problem with my mailer, or the unicode digest program? I su

Re: Ternary search trees for Unicode dictionaries

2003-11-18 Thread Theodore H. Smith
We tend to use tries, which have very good performance characteristics. See "bits of unicode" on my site: www.macchiato.com. You'll find many references of books, and papers available in PDF format on: http://citeseer.nj.nec.com/147026.html Thanks so much. This is actually the best article I've r

Re: Ternary search trees for Unicode dictionaries

2003-11-18 Thread Theodore H. Smith
Hi Mark, Your tries are nice, however they are being used for single unicode characters, not a whole string of them, right? Well, sure some of them are being used for whole strings, but for me, ALL of mine will be used for whole strings. Yours are quite rare. Does this make the advantage of tr

Re: Ternary search trees for Unicode dictionaries

2003-11-17 Thread Theodore H. Smith
I've looked into the TST thing. I'm not sure that it is optimal, despite how popular they are! Look at this, if I add "1", "2","3", "4", "5", "6", "7", "8", "9" to a TST, they will all be in a line, in the tree. All will be reference via the "high" node. So, to find "9", I have to read through

What does i18n mean?

2003-11-14 Thread Theodore H. Smith
what does i18n mean? I see it bandied about a lot. My guess is "internationalisation", but actually when you pronounce "eye won ayht en" it doesn't sound anything like that word. -- Theodore H. Smith - Macintosh Consultant / Contractor. My website:

Unicode dictionary coding? UTF8, UTF32, etc

2003-11-14 Thread Theodore H. Smith
Can someone give me some advice? If I was to write a dictionary class for Unicode, would I be better off writing it using a b-tree, or hash-bin system? Or maybe an array of pointers to arrays system? I suppose, that if I wanted an array of pointers to arrays, that I couldn't use UTF32, I could

Please help knock my FAQ into shape

2003-11-10 Thread Theodore H. Smith
ther FAQs on the internet, and replacing them with urls to Unicode.org ;o) . Also suggestions like putting urls to the reference of where I got my data from! Thanks to all who answer. -- Theodore H. Smith - Macintosh Consultant / Contractor. My website:

A sensible answer (was Re: UTF-9)

2003-11-01 Thread Theodore H. Smith
From: Michael Everson <[EMAIL PROTECTED]> Please drop this thread. Thats one of the most sensible answers I've heard to the nonsensical propositions that tend to fill this list ;oD!! (Including new hex characters, and other madnesses).

Re: Non-ascii string processing?

2003-10-05 Thread Theodore H. Smith
Hi Doug, heres some things I think. If you really aren't processing anything but the ASCII characters within your strings, like "<" and ">" in your example, you can probably get away with keeping your existing byte-oriented code. At least you won't get false matches on the ASCII characters (thi

Non-ascii string processing?

2003-10-04 Thread Theodore H. Smith
Hi lists, I'm wondering how people tend to do their non-ascii string processing. I'm wondering, if anyone really needs anything other than byte oriented code? I'm using UTF8 as my character format, and UTF8 is variable width, of course. I offer the option of processing UTF8, with byte function

[off] XML. And RAM

2003-08-14 Thread Theodore H. Smith
x27;d just need to write another editing mode, thats all. Or maybe just suggest they use a different text editor... I'm not sure really about trying to write a text editor that can handle gigabyte files! __Just thinking out loud__ Once again, this is really just me thinking aloud. Even if no one answers, already writing this, in the aim for people to understand, this helps me a lot get my thoughts straight! -- Theodore H. Smith - Macintosh Consultant / Contractor. My website:

Questions on ZWNBS

2003-08-02 Thread Theodore H. Smith
discouraged. Instead of "can't use ZWNBS", I think that char is discouraged. Where is the rule that discourages it? CC me directly please? -- Theodore H. Smith - Macintosh Consultant / Contractor. My website:

Arabic/Hebrew coding for the Mac

2003-07-06 Thread Theodore H. Smith
al order instead of logical, we'd just get a different set of headaches, not really less. Such is computing for real-world problems! Reply directly to me if you can please? At [EMAIL PROTECTED] -- Theodore H. Smith - Macintosh Consultant / Contractor. My website:

Re: Arabic script web site hosting solution for all platforms

2003-06-18 Thread Theodore H. Smith
Yahoo for your abuse of its AUP in this webmail. -- Theodore H. Smith - Macintosh Consultant / Contractor. My website:

[OT] OSX's bad Arabic support (was RE: [OT] No more IE for Mac)

2003-06-16 Thread Theodore H. Smith
cter, inability to select the right string of text via the mouse, occasional crashing with Arabic, etc. Actually Arabic displays in Safari OK. It just doesn't select OK. Entering one line of Arabic into Safari is OK, but multi lines give some of those bugs I mentioned. -- Theodore H.

Re: Not snazzy (was: New Unicode Savvy Logo)

2003-05-29 Thread Theodore H. Smith
27;d think. Also, you can apply this term to software, but in a different sense. -- Theodore H. Smith - Macintosh Consultant / Contractor. My website:

Emailing logos to the list

2003-05-29 Thread Theodore H. Smith
I'm not sure what other people experience, but I see a note saying the attachment was (quite correctly I think) removed from the email, and instead just lists the name and format of the attachment. I'm on the digest format.

Re: Not snazzy (was: New Unicode Savvy Logo)

2003-05-29 Thread Theodore H. Smith
, change or learn from. I wouldn't use it myself. I don't think I can be breaking a copyright by accepting a tiff emailed to me from Unicode.org staff. -- Theodore H. Smith - Macintosh Consultant / Contractor. My website:

Re: Not snazzy (was: New Unicode Savvy Logo)

2003-05-28 Thread Theodore H. Smith
stions from other people. I did the work, I did most of the design, but important elements came by other people's ideas. This way I own what I do and it is "in house", but still I am open to external improvement. Hey, if you can give me a tiff of the "Unicode" word (in it's large original format) which is the part that I actually did like, I could re-do the rest for you in PhotoShop v6 format, and submit as a suggestion. -- Theodore H. Smith - Macintosh Consultant / Contractor. My website:

Not snazzy (was: New Unicode Savvy Logo)

2003-05-27 Thread Theodore H. Smith
My first reaction, is that the logos don't look like they compare to other logos in terms of style. For example "Mac OSX" logos, XML logos, and that generally do look more snazzy. My second reaction is that I hope I haven't annoyed anyone. My third was that I probably ought to say it anyhow. Ma

Re: Why isn't my character displaying

2002-11-29 Thread Theodore H. Smith
Stigma is not a common character. Can you see it in any applications? Which fonts do you have that contain Greek characters? On a standard OS X install, I think this character is only present in the Japanese Hiragino Pro fonts. Also in Code2000, if you add this. I don't know what fonts contai

Why isn't my character displaying

2002-11-29 Thread Theodore H. Smith
Hi list, I'm directly calling ATSUI, for a framework I am writing. I have a character of value 987, "Stigma". This is part of my UTF16 string. The rest of the string displays just fine. But my Stigma doesn't, it shows up as the Rectangle. What is wrong? Is it something to do with font fallback

Quick ATSUI question

2002-11-20 Thread Theodore H. Smith
er words, I don't need a detailed answer, but if the whim takes you then do so. -- Theodore H. Smith - Macintosh Consultant / Contractor. My website:

Re: ATSUI for MacOS9

2002-11-19 Thread Theodore H. Smith
Thank you for the mail list address. I tried out the demo on OS9, and they work! Apparantly, the OS9 version won't hit test, emulated on OSX. And the Carbon version won't run on OS9 emulated, because all my attempts to set "Run in Classic Mode" in the info window failed. The check box wouldn't

Re: ATSUI for MacOS9

2002-11-19 Thread Theodore H. Smith
echnotes/tn/tn1176.html#atsui http://developer.apple.com/techpubs/macosx/Carbon/text/ATSUI/atsui.html -- Theodore H. Smith - Macintosh Consultant / Contractor. My website:

ATSUI text length parameters

2002-11-19 Thread Theodore H. Smith
eeling that it is the "ushort count", and not a char count like it claims. -- Theodore H. Smith - Macintosh Consultant / Contractor. My website:

ATSUI for MacOS9

2002-11-19 Thread Theodore H. Smith
. :o( Why not? Is this a bug in the demo, or a bug in ATSUI for OS9? Does ATSUI for Carbon on OS9 work if ATSUI for Classic OS9 doesn't? If anyone knows ATSUI well, could you please contact me so I can ask a few more questions? Thanks a lot. -- Theodore H. Smith - Macintosh Consulta

Silly proposals

2002-08-16 Thread Theodore H. Smith
haps I'm wrong and the posters here like reading and writing responses to silly proposals. Sorry if I'm threatening your fun then :oP -- Theodore H. Smith - Macintosh Consultant / Contractor. My website:

Re: Scripts in Unicode 4.0

2002-08-14 Thread Theodore H. Smith
and ideographs) > Aegean Numbers > Ugaritic Cuneiform > Shavian > Osmanya > Cypriot Syllabary Whats the point of having more Latin characters? Do they look like normal Roman characters? I think we have a few versions (3 or more?) of them, already. I thought once was enough. -- T

Re: Problem with ConvertUTF.c?

2002-07-16 Thread Theodore H. Smith
Seems like I missed the isLegalUTF8 function calls that verified if the UTF was valid UTF8, nevermind then, its all OK. On Wednesday, July 17, 2002, at 01:57 , Theodore H. Smith wrote: > The file ConvertUTF.c contains this array: > > > static const char trailingByte

Problem with ConvertUTF.c?

2002-07-16 Thread Theodore H. Smith
8 was tightened up. Perhaps this code should be tightened up along with the standard now? -- Theodore H. Smith - Macintosh Consultant / Contractor. My website:

Whats the difference between a composite and a combining sequence?

2002-07-08 Thread Theodore H. Smith
http://www.unicode.org/unicode/reports/tr15/ mentions both composites and combining sequences. But it doesn't tell us the difference. I know what a combining sequence is. If I didn't know what a composite was, I'd guess it was the same thing as a combining sequence. However, the two are meant

Re: Multiple encodings for 1 character

2002-07-08 Thread Theodore H. Smith
> And it is up to an implementation to specify which normalization > form it uses. > > By the way, we don't depreciate Unicode encodings -- we appreciate > them. ;-) Thats a shame. Simplicity is wonderful. -- Theodore H. Smith - Macintosh Consultant / Contractor. My website:

Multiple encodings for 1 character

2002-07-08 Thread Theodore H. Smith
What is going to be done about the confusion generated from having multiple ways to encode the same character? For example, for filenames, OSX will encode an accented Roman letter one way, while for filenames Windows will encode it the other way. These kind of confusions are totally expected,

roundtrip on UTF8 value 1114048 ?

2002-06-07 Thread Theodore H. Smith
Its not a major problem though until then, because thats above what almost anyone will be using. I don't know if its allocated yet, anyhow. Its below 10 though. -- Theodore H. Smith - Macintosh Consultant / Contractor. My website:

roundtrip on UTF8 value 1114048 ?

2002-06-07 Thread Theodore H. Smith
Its not a major problem though until then, because thats above what almost anyone will be using. I don't know if its allocated yet, anyhow. Its below 10 though. -- Theodore H. Smith - Macintosh Consultant / Contractor. My website:

UniCode website is confusing

2002-05-29 Thread Theodore H. Smith
mpiled into one standard definition. -- Theodore H. Smith - Macintosh Consultant / Contractor. My website:

Clean and simple unicode conversion C code

2002-05-29 Thread Theodore H. Smith
Does anyone have some clean and simple UniCode conversion code? I used the one on the FTP section of the UniCode website, however, I found about 5 bugs in it, which I reported, and now a user of mine is telling me that the results it gives is invalid. So does anyone have some clean and simple Un

How is UTF8, UTF16 and UTF32 encoded?

2002-05-29 Thread Theodore H. Smith
ke it easier to understand would help. Or perhaps I'm just reacting to the confusion of the UniCode website and its not that hard to understand and a simple definition would do? But the first idea certainly wouldn't hurt. -- Theodore H. Smith - Macintosh Consultant / Contractor. My website:

Why isnt the posting address on the list?

2002-05-29 Thread Theodore H. Smith
e just add the correct "Reply-To:" field, and have that point to [EMAIL PROTECTED] -- Theodore H. Smith - Macintosh Consultant / Contractor. My website: