Title: RE: UNIHAN.TXT
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
> Behalf Of [EMAIL PROTECTED]
> Sent: Friday, April 30, 2004 12:12 AM
> Tabs... In addition to the points Mike made about the tab
> character having
> different semantics depending on the appli
On Apr 30, 2004, at 1:12 AM, [EMAIL PROTECTED] wrote:
Like UNIHAN.TXT, brevity is not a feature of the following...
Tabs... In addition to the points Mike made about the tab character
having
different semantics depending on the application/platform, I just don't
think a control character like tab
At 11:13 AM 4/23/2004, Philippe Verdy wrote:
On Fri, 23 Apr 2004 12:12:57 -0400, "Edward H. Trager" <[EMAIL PROTECTED]>
said:
> 2 -- doing everything from regular windows gui tools, which have been
> unicode-freindly since forever.
Maybe on Windows based on newer NT kernels only (NT4, 2000, XP, 200
On Friday 2004.04.23 13:57:56 -0400, [EMAIL PROTECTED] wrote:
> Edward H. Trager scripsit:
>
> > (Windows' lack of a decent shell and command-line tools is probably
> > what makes the OS most annoying).
>
> Cygwin (http://www.cygwin.com) is your friend; it provides a relatively
> complete Unix h
On Fri, 23 Apr 2004 12:12:57 -0400, "Edward H. Trager" <[EMAIL PROTECTED]>
said:
> 2 -- doing everything from regular windows gui tools, which have been
> unicode-freindly since forever.
Maybe on Windows based on newer NT kernels only (NT4, 2000, XP, 2003, ...).
This sentence ("since forever") is
Edward H. Trager scripsit:
> (Windows' lack of a decent shell and command-line tools is probably
> what makes the OS most annoying).
Cygwin (http://www.cygwin.com) is your friend; it provides a relatively
complete Unix hosted on Win32. It works best on the NT branch of the
family when the disks
On Friday 2004.04.23 09:11:30 -0700, Benjamin Peterson wrote:
>
> On Fri, 23 Apr 2004 12:12:57 -0400, "Edward H. Trager"
> <[EMAIL PROTECTED]> said:
>
> > There is an issue that you might confront with these terminal-based tools
> > on
> > Windows and on Mac OSX that I myself don't know how to so
On Fri, 23 Apr 2004 12:12:57 -0400, "Edward H. Trager"
<[EMAIL PROTECTED]> said:
> There is an issue that you might confront with these terminal-based tools
> on
> Windows and on Mac OSX that I myself don't know how to solve, and that is
> that
> I don't know how to switch to a UTF-8 locale on ei
Edward H. Trager writes:
> Perhaps someone else on this list can tell us how to get Apple's terminal application
> or xterm running on OS X to display UTF-8 characters correctly[...]
This is trivial in the terminal:
1. Select "Window Settings" from the "Terminal" menu.
2. Select "Display" from t
I've been following this thread initiated by Raymond Mercier's comments on the
Unihan database with some slight amusement but mostly dismay that some
readers of this list are using the completely wrong software tools for dealing
with a *database* file like the Unihan table.
My sincerest advice t
> I've never managed to get either Notepad or Word to open Unihan.txt
Just use EMACS. Works fine.
Rick
> [Original Message]
> From: Tom Emerson <[EMAIL PROTECTED]>
> To: Gary P. Grosso <[EMAIL PROTECTED]>
> Cc: <[EMAIL PROTECTED]>
> Date: 4/21/2004 12:58:38 PM
> Subject: Re: Unihan.txt and other possible representations of the data
>
> Gary P. Grosso
Gary P. Grosso writes:
> There may be value in an HTML representation, utilizing links
> and multiple files. What would the logical division(s) be?
> Or has this already been done?
I'm working on a proposal for generating different representations of
Unihan, and this includes logical divisions. I
Raymond Mercier wrote:
> The problem of the size of Unihan has nothing at all to do with the
> cost of storage, and everything to do with the functioning of programs
> that might open and read it.
> Since the lines in Unihan are separated by 0x0A alone, not 0x0A0x0D,
> this means that when opened
On Tue, 20 Apr 2004 22:36:48 +0100, "Raymond Mercier" wrote:
>
> The problem of the size of Unihan has nothing at all to do with the cost of
> storage, and everything to do with the functioning of programs that might
> open and read it.
> Since the lines in Unihan are separated by 0x0A alone, not
From: "John Cowan" <[EMAIL PROTECTED]>
To: "Raymond Mercier" <[EMAIL PROTECTED]>
> Raymond Mercier scripsit:
>
> > Since the lines in Unihan are separated by 0x0A alone, not 0x0A0x0D, this
> > means that when opened in notepad the lines are not separated. Notepad does
> > have the advantage that
Unihan is designed, first and foremost, to be a _data_ file for
consumption by software. It doesn't matter at all how many spaces are
used for the tabs. The use of tabs make it trivial to write scfipts to
parse the file with grep, awk, Perl, Python.
With regards to the Pinyin orthography: tone num
Title: RE: Unihan.txt and the four dictionary sorting algorithm
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
> Behalf Of John Jenkins
> Sent: Tuesday, April 20, 2004 6:40 PM
> > The tab "character" is used in the file. Arguably, this
> "charac
On Apr 20, 2004, at 5:11 PM, [EMAIL PROTECTED] wrote:
The DOS editor chokes on such a large text file, so does my older hex
editor. Thank goodness for BabelPad, otherwise it would've been hard
to insert proper (for my system) line breaks into the file.
BBEdit on the Mac tends to be unhappy with i
Raymond Mercier wrote,
> John Jenkins writes
> >>Also, even though the full Unihan database is 25+ Mb in size, given the
> cheapness of disk space nowadays, it's not all *that* big, surely.
> <<
>
> The problem of the size of Unihan has nothing at all to do with the cost of
> storage, and everyt
Raymond Mercier scripsit:
> Since the lines in Unihan are separated by 0x0A alone, not 0x0A0x0D, this
> means that when opened in notepad the lines are not separated. Notepad does
> have the advantage that the UTF-8 encoding is recognized, and the characters
> are displayed.
Changing to a line te
"Raymond Mercier" <[EMAIL PROTECTED]> writes:
> The problem of the size of Unihan has nothing at all to do with the cost of
> storage, and everything to do with the functioning of programs that might
> open and read it.
It's a data file stored as a text file to be simple; it's not designed
John Jenkins writes
>>Also, even though the full Unihan database is 25+ Mb in size, given the
cheapness of disk space nowadays, it's not all *that* big, surely.
<<
The problem of the size of Unihan has nothing at all to do with the cost of
storage, and everything to do with the functioning of prog
On Apr 19, 2004, at 8:40 PM, Ernest Cline wrote:
For example, if there is a value of kIRGKungXi of the form
.YY0 there will always be the same value for the
kKangXi for that character and vice versa.
This is not a safe assumption. There are 37 cases where the kIRGKangXi
field ends in 0 but t
Ernest Cline writes
>>I'm trying to pare Unihan.txt down to a less unwieldy size
for my own use by eliminating properties that are of no
interest to me <<
The sheer size of unihan creates problems, hence the need to extract
manageable subsets.
This is the basis of my Hanfind:
(http://ourworld.
On Thursday, October 12, 2000, at 06:22 PM, Thomas Chan wrote:
In the version of the unihan.txt file distributed with Unicode 3.0, there
is an undocumented field called "kJHJ" with a few thousand records. What
does this refer to?
Oops. That should have been filtered out. It's an internal, und
26 matches
Mail list logo