ns there, and I will always notify
these lists of updates as well.
As always, corrections, new mapping tables, information about mappings,
and even pointers to things like fonts or texts with odd encodings are
gladly accepted.
--
Mark Leisher
e mappings not
typically found in character set conversion tools available today."
As always, I am happy to accept mapping tables/conversion program source
code for any other obscure or under-represented encodings.
--
Mark Leisher
welcome.
--
-------
Mark Leisher
Computing Research LabA sneer is the weapon of the weak.
New Mexico State University -- James Russell Lowell (1819-1891)
Box 30001, MSC 3CRL
Las Cruces, NM 88003
these encodings is not likely to be included in the most popular
character set conversion tools (i.e. iconv), so this package was put
together to ease conversion of text in obscure character encodings to
Unicode and for historical curiosity.
Mark Leisher
relevant
information.
http://crl.nmsu.edu/~mleisher/csets.html
--
---
Mark Leisher
Computing Research LabAll political parties die at last of
New Mexico State University swallowing their own lies.
Box 30001, MSC
Tomoyuki> Unicode::Collate 0.23 is released.
Could you remind us where to find it again? Thanks!
-----
Mark Leisher
Computing Research LabThe mountain remains unmoved at
New Mexico State Univers
he Persian Hamshahri visual encoding.
---------
Mark Leisher
Computing Research LabTelevision has raised writing
New Mexico State University to a new low.
Box 30001, Dept. 3CRL -
the general problems that come up in Indic encodings.
-----
Mark Leisher
Computing Research LabTelevision has raised writing
New Mexico State University to a new low.
Box 30001, Dept. 3
irectly
available from ftp://www.unicode.org/Public/3.1-Update1/Unihan-3.1.1.txt.gz.
-----
Mark LeisherOrthodoxy, of whatever color, seems to
Computing Research Lab demand a lifeless, imitati
abase.
August 1, 2001."
---------
Mark LeisherOrthodoxy, of whatever color, seems to
Computing Research Lab demand a lifeless, imitative style.
New Mexico State University
Box 30001, Dept. 3CRL -- Politics and the Engli
ill get an answer.
-----
Mark LeisherOrthodoxy, of whatever color, seems to
Computing Research Lab demand a lifeless, imitative style.
New Mexico State University
Box 30001, Dept. 3CRL -- Politics and the English Languag
asonable conversion
capability at about 1/16 the size of ICU.
---------
Mark Leisher
Computing Research LabCinema, radio, television, magazines are a
New Mexico State University school of inattention: people look with
web
page.
http://crl.nmsu.edu/~mleisher/csets.html
These tables will be part of the CSets 1.8 distribution.
-
Mark Leisher
Computing Research LabCinema, radio, television, magazines are a
New Mexico State
ional Standards Institute (ANSI).
-
Mark Leisher
Computing Research LabCinema, radio, television, magazines are a
New Mexico State University school of inattention: people look without
Box 30001, Dept.
ional Standards Institute (ANSI).
-
Mark Leisher
Computing Research LabCinema, radio, television, magazines are a
New Mexico State University school of inattention: people look without
Box 30001, Dept.
/csets.tar.gz
ftp://crl.nmsu.edu/CLR/multiling/character-sets/csets.zip
-
Mark Leisher
Computing Research LabCinema, radio, television, magazines are a
New Mexico State University school of inattention: people
Philip> On Thu, 26 Oct 2000, Mark Leisher wrote:
>> Following the first page will be all the other pages, each in the same
>> format as the first: one number identifying the page followed by 256
>> double-byte Unicode (UCS-2) characters. If a char
ly unrecognized
Peter> characters?
I don't know. I last used Tcl/Tk in the days of tcl7.?/tk4.? and haven't had
time to play with anything newer. I do prefer Perl :-)
-----
Mark Leisher
Computing Research Lab
unknown characters in the source text or change the 0x's to 0x.
---------
Mark Leisher
Computing Research LabCinema, radio, television, magazines are a
New Mexico State University school of in
Peter> Mark Leisher then replied:
>> If the converted string contains 0x, it will be pretty clear the
>> source text had bogus characters the moment you display it.
Peter> According to Nick's translated doc the first character on the third
Peter
Philip> On Wed, 25 Oct 2000, Mark Leisher wrote:
>> There may some day be a use for the Unicode codepoint 0x. It might
>> be better to make this 0x, which is a guaranteed non-character in
>> Unicode and probably in ISO10646.
Philip> Isn'
Unicode and
probably in ISO10646.
---------
Mark Leisher
Computing Research LabCinema, radio, television, magazines are a
New Mexico State University school of inattention: people look without
Box 30001, Dept. 3CRL
e a while now. But like
many of us, I've got a handful of critical projects with hard deadlines to
meet.
---------
Mark Leisher
Computing Research LabCinema, radio, television, magazines are a
New Mexico State Un
happen as well.
Although complicated on the surface, I highly recommend using Tech Report #22
on the Unicode website as a guideline for designing future mapping tables.
-----
Mark Leisher
Computing Research Lab
ed, the term UTF-32 will be deprecated and the
term UCS-4 will be used instead.
---------
Mark Leisher
Computing Research LabCinema, radio, television, magazines are a
New Mexico State University school of ina
e the Unicode
Standard 3.0 page 19). Combining surrogates constitutes a UCS-4 encoding (or
UTF-32 until unavailable 10646 private use regions are removed).
-----
Mark Leisher
Computing Research LabCinema, radio, telev
le answer :-)
-----
Mark Leisher
Computing Research LabCinema, radio, television, magazines are a
New Mexico State University school of inattention: people look without
Box 30001, Dept. 3CRL seeing, listen without hearing.
Las Cruces,
ter.
Then chars_to_utf8() and utf8_to_chars() don't need an encoding parameter
because they simply convert between Unicode characters and UTF-8.
Or is there some other factor I've missed in all the confusion?
-----
; (an actual complaint I received
more than once).
-----
Mark Leisher
Computing Research LabCinema, radio, television, magazines are a
New Mexico State University school of inattention: people look without
Box 30001, Dept. 3CRL seei
My only comment would be that the functions which assume 8859-1 should be
removed to avoid the inevitable confusion. Or as some else suggested earlier,
changed to use the active system encoding.
-
Mark Leisher
Computing
Perl is free to burp on us.
Quite right. Sorry.
---------
Mark Leisher
Computing Research LabCinema, radio, television, magazines are a
New Mexico State University school of inattention: people
xtraneous.
-----
Mark Leisher
Computing Research LabCinema, radio, television, magazines are a
New Mexico State University school of inattention: people look without
Box 30001, Dept. 3CRL seeing, listen without hearing.
Las Cruces,
32 matches
Mail list logo