just perfectly.
--
Theodore H. Smith - Software Developer - www.elfdata.com/plugin/
Industrial strength string processing code, made easy.
(If you believe that's an oxymoron, see for yourself.)
-check 3MB of pure Unicode text (no
style junk bloating up the file-size) in one second, for you. (The
words would all be spelt correctly though, so as to not require
expensive RAM copying when doing the replacements.)
Yes, I do know how to code ;o)
Too bad so few others do.
--
Theodore H. S
pression scheme, the Unicode text would be even
smaller than UTF16. SCSU (or whatever its called) can be processed as
markup (XML for example) with no decompression, so its quite handy.
Anyhow, thats why I think UTF-8 is really the way to go.
Its too bad MicroSoft and Apple didn't realise the s
ithout breaking a lot of existing stuff, something
that the Java team strive to avoid.
Theodore H. Smith wrote:
http://java.sun.com/j2se/1.5.0/docs/api/java/io/
DataInput.html#modified-utf-8
If only people could sue for suggesting bad coding practices ;o)
--
Theodore H. Smith - Software Developer.
http://www.elfdata.com
http://java.sun.com/j2se/1.5.0/docs/api/java/io/
DataInput.html#modified-utf-8
If only people could sue for suggesting bad coding practices ;o)
--
Theodore H. Smith - Software Developer.
http://www.elfdata.com
Are the Unicode Code Points from +U0 to +UFF, equivalent to the Windows
Latin 1 code points?
Or is it equivalent to ISO-Latin-1?
Sorry for asking this, I know its answered somewhere, but I can't seem
to find the answer on your website.
--
Theodore H. Smith - Software Developer.
ere impossible bytes that simply aren't in the file?
- the file mixes UTF-8 and UTF-16
Does this file mix UTF-8 and UTF-16? I thought it just had surrogates
encoded into UTF-8? Of course a surrogate should never exist in UTF-8.
--
Theodore H. Smith - Software Developer.
http://www.elfdata.com
editor, delete all the non test lines, and then
separate out the good and the bad UTF8 into different files! That way I
can use readline type code to do my UTF-8 verification.
It would be nice if someone had a "automated test ready" UTF-8 file.
If not, I'll modify this one and the
ytes.
As long as we don't pass any invalid UTF-8 to client apps/code, and we
don't process any invalid UTF-8, we are fine, so modifying the bytes of
the UTF8 text before doing anything with it, can in some circumstances
work.
--
Theodore H. Smith - Software Developer.
http://www.elfdata.com
I've often wanted to type a symbol, that's like an exclamation mark,
and a comma at the same time. That is, instead of the "." on the bottom
of a "!", it has a "," instead.
Is there such a Unicode code point? Just out of curiosity! Or I
suppose, is there a way to compose such a character.
in a
place where other people may learn about the TheoTrie.
--
Theodore H. Smith - Macintosh Consultant / Contractor.
My website:
I see a lot of garbage characters in the unicode digest.
The same emails, however, display fine when emailed to me directly,
(although I can't understand them sometimes ;o) but someone who speaks
the correct language would).
Is this a problem with my mailer, or the unicode digest program? I
su
We tend to use tries, which have very good performance
characteristics.
See
"bits of unicode" on my site: www.macchiato.com.
You'll find many references of books, and papers available in PDF
format on:
http://citeseer.nj.nec.com/147026.html
Thanks so much. This is actually the best article I've r
Hi Mark,
Your tries are nice, however they are being used for single unicode
characters, not a whole string of them, right? Well, sure some of them
are being used for whole strings, but for me, ALL of mine will be used
for whole strings. Yours are quite rare.
Does this make the advantage of tr
I've looked into the TST thing.
I'm not sure that it is optimal, despite how popular they are!
Look at this, if I add "1", "2","3", "4", "5", "6", "7", "8", "9" to a
TST, they will all be in a line, in the tree. All will be reference via
the "high" node.
So, to find "9", I have to read through
what does i18n mean? I see it bandied about a lot.
My guess is "internationalisation", but actually when you pronounce
"eye won ayht en" it doesn't sound anything like that word.
--
Theodore H. Smith - Macintosh Consultant / Contractor.
My website:
Can someone give me some advice? If I was to write a dictionary class
for Unicode, would I be better off writing it using a b-tree, or
hash-bin system? Or maybe an array of pointers to arrays system?
I suppose, that if I wanted an array of pointers to arrays, that I
couldn't use UTF32, I could
ther FAQs on the internet, and replacing them with urls to Unicode.org
;o) . Also suggestions like putting urls to the reference of where I
got my data from!
Thanks to all who answer.
--
Theodore H. Smith - Macintosh Consultant / Contractor.
My website:
From: Michael Everson <[EMAIL PROTECTED]>
Please drop this thread.
Thats one of the most sensible answers I've heard to the nonsensical
propositions that tend to fill this list ;oD!! (Including new hex
characters, and other madnesses).
Hi Doug,
heres some things I think.
If you really aren't processing anything but the ASCII characters
within
your strings, like "<" and ">" in your example, you can probably get
away with keeping your existing byte-oriented code. At least you won't
get false matches on the ASCII characters (thi
Hi lists,
I'm wondering how people tend to do their non-ascii string processing.
I'm wondering, if anyone really needs anything other than byte oriented
code? I'm using UTF8 as my character format, and UTF8 is variable
width, of course. I offer the option of processing UTF8, with byte
function
x27;d just need to write another editing mode,
thats all. Or maybe just suggest they use a different text editor...
I'm not sure really about trying to write a text editor that can handle
gigabyte files!
__Just thinking out loud__
Once again, this is really just me thinking aloud. Even if no one
answers, already writing this, in the aim for people to understand,
this helps me a lot get my thoughts straight!
--
Theodore H. Smith - Macintosh Consultant / Contractor.
My website:
discouraged.
Instead of "can't use ZWNBS", I think that char is discouraged. Where
is the rule that discourages it?
CC me directly please?
--
Theodore H. Smith - Macintosh Consultant / Contractor.
My website:
al order instead of
logical, we'd just get a different set of headaches, not really less.
Such is computing for real-world problems!
Reply directly to me if you can please? At [EMAIL PROTECTED]
--
Theodore H. Smith - Macintosh Consultant / Contractor.
My website:
Yahoo for your abuse of its AUP in this webmail.
--
Theodore H. Smith - Macintosh Consultant / Contractor.
My website:
cter, inability to select the right string of text via the mouse,
occasional crashing with Arabic, etc.
Actually Arabic displays in Safari OK. It just doesn't select OK.
Entering one line of Arabic into Safari is OK, but multi lines give
some of those bugs I mentioned.
--
Theodore H.
27;d think. Also, you
can apply this term to software, but in a different sense.
--
Theodore H. Smith - Macintosh Consultant / Contractor.
My website:
I'm not sure what other people experience, but I see a note saying the
attachment was (quite correctly I think) removed from the email, and
instead just lists the name and format of the attachment.
I'm on the digest format.
, change or learn from. I
wouldn't use it myself. I don't think I can be breaking a copyright by
accepting a tiff emailed to me from Unicode.org staff.
--
Theodore H. Smith - Macintosh Consultant / Contractor.
My website:
stions from other people. I did
the work, I did most of the design, but important elements came by
other people's ideas. This way I own what I do and it is "in house",
but still I am open to external improvement.
Hey, if you can give me a tiff of the "Unicode" word (in it's large
original format) which is the part that I actually did like, I could
re-do the rest for you in PhotoShop v6 format, and submit as a
suggestion.
--
Theodore H. Smith - Macintosh Consultant / Contractor.
My website:
My first reaction, is that the logos don't look like they compare to
other logos in terms of style. For example "Mac OSX" logos, XML logos,
and that generally do look more snazzy.
My second reaction is that I hope I haven't annoyed anyone.
My third was that I probably ought to say it anyhow. Ma
Stigma is not a common character. Can you see it in any applications?
Which fonts do you have that contain Greek characters?
On a standard OS X install, I think this character is only present in
the
Japanese Hiragino Pro fonts. Also in Code2000, if you add this.
I don't know what fonts contai
Hi list,
I'm directly calling ATSUI, for a framework I am writing.
I have a character of value 987, "Stigma". This is part of my UTF16
string. The rest of the string displays just fine. But my Stigma
doesn't, it shows up as the Rectangle.
What is wrong? Is it something to do with font fallback
er words, I don't need a detailed answer, but if
the whim takes you then do so.
--
Theodore H. Smith - Macintosh Consultant / Contractor.
My website:
Thank you for the mail list address.
I tried out the demo on OS9, and they work! Apparantly, the OS9 version
won't hit test, emulated on OSX. And the Carbon version won't run on
OS9 emulated, because all my attempts to set "Run in Classic Mode" in
the info window failed. The check box wouldn't
echnotes/tn/tn1176.html#atsui
http://developer.apple.com/techpubs/macosx/Carbon/text/ATSUI/atsui.html
--
Theodore H. Smith - Macintosh Consultant / Contractor.
My website:
eeling that it is the "ushort count", and not a char
count like it claims.
--
Theodore H. Smith - Macintosh Consultant / Contractor.
My website:
. :o(
Why not? Is this a bug in the demo, or a bug in ATSUI for OS9? Does
ATSUI for Carbon on OS9 work if ATSUI for Classic OS9 doesn't?
If anyone knows ATSUI well, could you please contact me so I can ask a
few more questions? Thanks a lot.
--
Theodore H. Smith - Macintosh Consulta
haps I'm wrong and the posters here like reading and writing
responses to silly proposals. Sorry if I'm threatening your fun
then :oP
--
Theodore H. Smith - Macintosh Consultant / Contractor.
My website:
and ideographs)
> Aegean Numbers
> Ugaritic Cuneiform
> Shavian
> Osmanya
> Cypriot Syllabary
Whats the point of having more Latin characters? Do they look
like normal Roman characters? I think we have a few versions (3
or more?) of them, already. I thought once was enough.
--
T
Seems like I missed the isLegalUTF8 function calls that verified
if the UTF was valid UTF8, nevermind then, its all OK.
On Wednesday, July 17, 2002, at 01:57 , Theodore H. Smith wrote:
> The file ConvertUTF.c contains this array:
>
>
> static const char trailingByte
8 was tightened up.
Perhaps this code should be tightened up along with the standard
now?
--
Theodore H. Smith - Macintosh Consultant / Contractor.
My website:
http://www.unicode.org/unicode/reports/tr15/ mentions both
composites and combining sequences.
But it doesn't tell us the difference. I know what a combining
sequence is. If I didn't know what a composite was, I'd guess it
was the same thing as a combining sequence.
However, the two are meant
> And it is up to an implementation to specify which normalization
> form it uses.
>
> By the way, we don't depreciate Unicode encodings -- we appreciate
> them. ;-)
Thats a shame. Simplicity is wonderful.
--
Theodore H. Smith - Macintosh Consultant / Contractor.
My website:
What is going to be done about the confusion generated from
having multiple ways to encode the same character?
For example, for filenames, OSX will encode an accented Roman
letter one way, while for filenames Windows will encode it the
other way. These kind of confusions are totally expected,
Its not a major problem though until then, because thats above
what almost anyone will be using. I don't know if its allocated
yet, anyhow. Its below 10 though.
--
Theodore H. Smith - Macintosh Consultant / Contractor.
My website:
Its not a major problem though until then, because thats above
what almost anyone will be using. I don't know if its allocated
yet, anyhow. Its below 10 though.
--
Theodore H. Smith - Macintosh Consultant / Contractor.
My website:
mpiled into one standard
definition.
--
Theodore H. Smith - Macintosh Consultant / Contractor.
My website:
Does anyone have some clean and simple UniCode conversion code?
I used the one on the FTP section of the UniCode website, however,
I found about 5 bugs in it, which I reported, and now a user of mine
is telling me that the results it gives is invalid.
So does anyone have some clean and simple Un
ke it easier to understand would help.
Or perhaps I'm just reacting to the confusion of the UniCode
website and its not that hard to understand and a simple definition
would do? But the first idea certainly wouldn't hurt.
--
Theodore H. Smith - Macintosh Consultant / Contractor.
My website:
e just add the correct
"Reply-To:" field,
and have that point to [EMAIL PROTECTED]
--
Theodore H. Smith - Macintosh Consultant / Contractor.
My website:
51 matches
Mail list logo