subject:"\[XeTeX\] traditional to simplified Chinese character conversion utility or data base"

Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base

2011-10-23 Thread BPJ


On 2011-10-22 08:11, Zdenek Wagner wrote:

On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner  wrote:

>>  If you wish to do it on the fly in XeTeX,  you can write a TECkit map.

>
>  I do have a map now. Can someone tell me how to do the conversion "on
>  the fly" in XeLaTeX? I did see the command line option
>  "-translate-file=TCXNAME", but for that it says "(ignored)".
>

\usepackage{fontspec}
\setmainfont[Mapping=mapname]{fontname}

or

\fontspec[Mapping=mapname]{fontname}


or

\newfontfamily\[Mapping=mapname]{System Family Name}

or

\newfontface\[Mapping=mapname]{System Family Name Bold}
where bold can be Bold/Italic/Bold Italic/Normal,

if only part of the document should use the map or the font.

see section I.6 of the fontspec documentation.

/bpj


--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base

2011-10-22 Thread Daniel Greenhoe

On Sat, Oct 22, 2011 at 2:11 PM, Zdenek Wagner  wrote:
> \addfontfeatures{Mapping=mapname}.

Amazing. I tested it. It works. Amazing. Thanks! It's a really great solution.

Dan

On Sat, Oct 22, 2011 at 2:11 PM, Zdenek Wagner  wrote:
> 2011/10/22 Daniel Greenhoe :
>> On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner  
>> wrote:
>>> If you wish to do it on the fly in XeTeX,  you can write a TECkit map.
>>
>> I do have a map now. Can someone tell me how to do the conversion "on
>> the fly" in XeLaTeX? I did see the command line option
>> "-translate-file=TCXNAME", but for that it says "(ignored)".
>>
> \usepackage{fontspec}
> \setmainfont[Mapping=mapname]{fontname}
>
> or
>
> \fontspec[Mapping=mapname]{fontname}
>
> TCX tables are used in pdftex and the table is used for the whole
> document (and cannot be changed). TECkit map is applied in XeTeX per
> font and can even be replaced (eg in a group) by
> \addfontfeatures{Mapping=mapname}.
>
>> Dan
>>
>>
>> On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner  
>> wrote:
>>> 2011/10/17 Daniel Greenhoe :
 I know that this is not really the right mailing list for this
 question, but I have so far not found the answer by any other means
 ...

 I would like to find or write some a utility that would take an
 unicode encoded file and map Chinese traditional characters to
 simplified, while leaving all other code points (such  as those in the
 Latin and IPA code spaces) untouched. For example, the traditional
 character for horse (馬) is at unicode U+99AC, the simplified one (马)
 is at unicode U+9A6C, and the Latin character for "A" is at U+0041. So
 I want a utility that would change the 99AC to 9A6C, but leave the
 0041 unchanged.

>>> If it is really that simple 1:1 mapping, you can just use tr, it does
>>> exactly that if you supply the map. If you wish to do it on the fly in
>>> XeTeX, you can write a TECkit map. Having the TECkit map you can also
>>> run txtconv from the command line.
>>>
 Does anyone know of such a utility? Does anyone know of any data base
 with a traditional to simplified character mapping such that I could
 maybe write the utility myself?

 Many thanks in advance,
 Dan



 --
 Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

>>>
>>>
>>>
>>> --
>>> Zdeněk Wagner
>>> http://hroch486.icpf.cas.cz/wagner/
>>> http://icebearsoft.euweb.cz
>>>
>>>
>>>
>>> --
>>> Subscriptions, Archive, and List information, etc.:
>>>  http://tug.org/mailman/listinfo/xetex
>>>
>>
>>
>>
>> --
>> Subscriptions, Archive, and List information, etc.:
>>  http://tug.org/mailman/listinfo/xetex
>>
>
>
>
> --
> Zdeněk Wagner
> http://hroch486.icpf.cas.cz/wagner/
> http://icebearsoft.euweb.cz
>
>
>
> --
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex
>



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base

2011-10-21 Thread Zdenek Wagner

2011/10/22 Daniel Greenhoe :
> On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner  
> wrote:
>> If you wish to do it on the fly in XeTeX,  you can write a TECkit map.
>
> I do have a map now. Can someone tell me how to do the conversion "on
> the fly" in XeLaTeX? I did see the command line option
> "-translate-file=TCXNAME", but for that it says "(ignored)".
>
\usepackage{fontspec}
\setmainfont[Mapping=mapname]{fontname}

or

\fontspec[Mapping=mapname]{fontname}

TCX tables are used in pdftex and the table is used for the whole
document (and cannot be changed). TECkit map is applied in XeTeX per
font and can even be replaced (eg in a group) by
\addfontfeatures{Mapping=mapname}.

> Dan
>
>
> On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner  
> wrote:
>> 2011/10/17 Daniel Greenhoe :
>>> I know that this is not really the right mailing list for this
>>> question, but I have so far not found the answer by any other means
>>> ...
>>>
>>> I would like to find or write some a utility that would take an
>>> unicode encoded file and map Chinese traditional characters to
>>> simplified, while leaving all other code points (such  as those in the
>>> Latin and IPA code spaces) untouched. For example, the traditional
>>> character for horse (馬) is at unicode U+99AC, the simplified one (马)
>>> is at unicode U+9A6C, and the Latin character for "A" is at U+0041. So
>>> I want a utility that would change the 99AC to 9A6C, but leave the
>>> 0041 unchanged.
>>>
>> If it is really that simple 1:1 mapping, you can just use tr, it does
>> exactly that if you supply the map. If you wish to do it on the fly in
>> XeTeX, you can write a TECkit map. Having the TECkit map you can also
>> run txtconv from the command line.
>>
>>> Does anyone know of such a utility? Does anyone know of any data base
>>> with a traditional to simplified character mapping such that I could
>>> maybe write the utility myself?
>>>
>>> Many thanks in advance,
>>> Dan
>>>
>>>
>>>
>>> --
>>> Subscriptions, Archive, and List information, etc.:
>>>  http://tug.org/mailman/listinfo/xetex
>>>
>>
>>
>>
>> --
>> Zdeněk Wagner
>> http://hroch486.icpf.cas.cz/wagner/
>> http://icebearsoft.euweb.cz
>>
>>
>>
>> --
>> Subscriptions, Archive, and List information, etc.:
>>  http://tug.org/mailman/listinfo/xetex
>>
>
>
>
> --
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex
>



-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base

2011-10-21 Thread Daniel Greenhoe

On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner  wrote:
> If you wish to do it on the fly in XeTeX,  you can write a TECkit map.

I do have a map now. Can someone tell me how to do the conversion "on
the fly" in XeLaTeX? I did see the command line option
"-translate-file=TCXNAME", but for that it says "(ignored)".

Dan


On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner  wrote:
> 2011/10/17 Daniel Greenhoe :
>> I know that this is not really the right mailing list for this
>> question, but I have so far not found the answer by any other means
>> ...
>>
>> I would like to find or write some a utility that would take an
>> unicode encoded file and map Chinese traditional characters to
>> simplified, while leaving all other code points (such  as those in the
>> Latin and IPA code spaces) untouched. For example, the traditional
>> character for horse (馬) is at unicode U+99AC, the simplified one (马)
>> is at unicode U+9A6C, and the Latin character for "A" is at U+0041. So
>> I want a utility that would change the 99AC to 9A6C, but leave the
>> 0041 unchanged.
>>
> If it is really that simple 1:1 mapping, you can just use tr, it does
> exactly that if you supply the map. If you wish to do it on the fly in
> XeTeX, you can write a TECkit map. Having the TECkit map you can also
> run txtconv from the command line.
>
>> Does anyone know of such a utility? Does anyone know of any data base
>> with a traditional to simplified character mapping such that I could
>> maybe write the utility myself?
>>
>> Many thanks in advance,
>> Dan
>>
>>
>>
>> --
>> Subscriptions, Archive, and List information, etc.:
>>  http://tug.org/mailman/listinfo/xetex
>>
>
>
>
> --
> Zdeněk Wagner
> http://hroch486.icpf.cas.cz/wagner/
> http://icebearsoft.euweb.cz
>
>
>
> --
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex
>



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base

2011-10-20 Thread Daniel Greenhoe

I seem to have a working solution now. Yesterday I wrote a c program
to convert the Unihan_variants.txt file (suggested by Arthur) to an
ascii TECkit (suggested by Zdenek) map, then used TECkit's
teckit_compile utility to convert that to a binary map, and then used
TECkit's txtconv utility (also suggested by Zdenek) to map the
traditional characters to simplified. The map files contain 12,730
unicode to unicode mapping relations each. More testing would
definitely be good (no guarantees at this point).

If anyone has interest, they can download this zip file:
  http://banyan.cm.nctu.edu.tw/~dgreenhoe/groups/var2map.zip

The zip file includes the c source code, makefile, mapping file, and
tec file, as well as a Windows executable. The included tec file is
based on the Unicode 6.1.0 standard. If a new standard becomes
available, var2map.exe and teckit_complile.exe can be run again to
update the binary mapping file.

Using make, you can change the directory paths in the makefile and enter
  "make all"
on the command line for a kind of demo. The demo maps some Latin and
traditional characters (in trad.tex) to Latin and simplified
characters (in simp.tex).

On Thu, Oct 20, 2011 at 11:47 PM, BPJ  wrote:
> I got the thought that this might be done at least approximatively by ...
>  $ grep 'kSimplifiedVariant' Unihan_Variants.txt \
>  |perl -ple's/kSimplifiedVariant/>/' >>tex-chi-sim-trad.map
> tex-text.map, plus some very little manual touching up
> of debris after a comment line in Unihan_Variants.txt and
> adding some descriptive comments.

It looks like this solution from BPJ does essentially the same thing
as the above mentioned c program. In addition, this solution by BPJ
has the additional benefit, because it is a perl script, of being
cross-platform without having to run a c compiler.

As a follow-up to Andy's suggestion of the Tong Wen code: I did look
into the code. I found what appears might be a good set of data bases
for the simplified to traditional conversion, but I didn't seem to
find a traditional to simplified solution. I did join a mailing list
for the project and posted a request for assistance, but so far have
not received any reply. Maybe the project has become dormant.

Thank you very much to everyone who gave me help on this --- Zdenek
for the TECnik suggestion, Andy for the Tong Wen suggestion, Arthur
for the Unihan_Variants suggestion, and BPJ for the perl suggestion. I
appreciate the help very much --- I don't know if I would have ever
arrived at a solution without it.

One of the next tasks is to find quality fonts (preferably OpenType)
for Simplified Chinese, including fonts with Ruby text  (Zhu-Yin or
Pin-Yin). If anyone has suggestions of useful font repositories,
please let me know. Thanks!

Dan

On Thu, Oct 20, 2011 at 11:47 PM, BPJ  wrote:
> I got the thought that this might be done at least
> approximatively by simply running the the following
> command in the terminal:
>
>  $ grep 'kSimplifiedVariant' Unihan_Variants.txt \
>      |perl -ple's/kSimplifiedVariant/>/' >>tex-chi-sim-trad.map
>
> where Unihan_Variants.txt is the file from the Unicode
> Unihan database and tex-chi-sim-trad.map is a copy of
> tex-text.map, plus some very little manual touching up
> of debris after a comment line in Unihan_Variants.txt and
> adding some descriptive comments. The results are attached.
>
> /bpj
>
> On 2011-10-20 00:44, Daniel Greenhoe wrote:
>>
>> Hi Arthur,
>>
>> On Thu, Oct 20, 2011 at 1:02 AM, Arthur Reutenauer
>>   wrote:
>>>
>>>  Unicode has that in the Unihan database:
>>>  look up Unihan_Variants.txt in Unihan.zip
>>> (latest version
>>> http://www.unicode.org/Public/6.1.0/ucd/Unihan-6.1.0d1.zip )
>>
>> It looks like I can extract everything I need from Unihan_Variants.txt.
>> Thank you so much for your help! I appreciate it very much.
>>
>> Dan
>>
>> On Thu, Oct 20, 2011 at 1:02 AM, Arthur Reutenauer
>>   wrote:
>>>
>>> On Tue, Oct 18, 2011 at 05:49:28AM +0800, Daniel Greenhoe wrote:

                                     Does anyone know of any data base
 with a traditional to simplified character mapping such that I could
 maybe write the utility myself?
>>>
>>>  Unicode has that in the Unihan database: look up Unihan_Variants.txt
>>> in Unihan.zip (latest version
>>> http://www.unicode.org/Public/6.1.0/ucd/Unihan-6.1.0d1.zip )
>>>
>>>        Arthur
>>>
>>>
>>> --
>>> Subscriptions, Archive, and List information, etc.:
>>>  http://tug.org/mailman/listinfo/xetex
>>>
>>
>>
>>
>> --
>> Subscriptions, Archive, and List information, etc.:
>>   http://tug.org/mailman/listinfo/xetex
>
>
>
>
> --
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex
>
>

--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/l

Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base

2011-10-19 Thread Daniel Greenhoe

Hi Arthur,

On Thu, Oct 20, 2011 at 1:02 AM, Arthur Reutenauer
 wrote:
>  Unicode has that in the Unihan database:
>  look up Unihan_Variants.txt in Unihan.zip
> (latest version http://www.unicode.org/Public/6.1.0/ucd/Unihan-6.1.0d1.zip )

It looks like I can extract everything I need from Unihan_Variants.txt.
Thank you so much for your help! I appreciate it very much.

Dan

On Thu, Oct 20, 2011 at 1:02 AM, Arthur Reutenauer
 wrote:
> On Tue, Oct 18, 2011 at 05:49:28AM +0800, Daniel Greenhoe wrote:
>>                                     Does anyone know of any data base
>> with a traditional to simplified character mapping such that I could
>> maybe write the utility myself?
>
>  Unicode has that in the Unihan database: look up Unihan_Variants.txt
> in Unihan.zip (latest version
> http://www.unicode.org/Public/6.1.0/ucd/Unihan-6.1.0d1.zip )
>
>        Arthur
>
>
> --
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex
>



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base

2011-10-19 Thread Arthur Reutenauer

On Tue, Oct 18, 2011 at 05:49:28AM +0800, Daniel Greenhoe wrote:
> Does anyone know of any data base
> with a traditional to simplified character mapping such that I could
> maybe write the utility myself?

  Unicode has that in the Unihan database: look up Unihan_Variants.txt
in Unihan.zip (latest version
http://www.unicode.org/Public/6.1.0/ucd/Unihan-6.1.0d1.zip )

Arthur


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base

2011-10-18 Thread Daniel Greenhoe

On Wed, Oct 19, 2011 at 10:05 AM, Andy Lin  wrote:
> You can try digging in the source for Tong Wen Tang ... Or email its 
> developers.

That's a great idea --- thanks!

Dan


On Wed, Oct 19, 2011 at 10:05 AM, Andy Lin  wrote:
> You can try digging in the source for Tong Wen Tang (a Firefox
> extension). Or email its developers. They should have a map and
> additional notes on the conversion.
>
> On Tue, Oct 18, 2011 at 18:50, Daniel Greenhoe  wrote:
>> Hi Zdenek, Thank you for your suggestions.
>>
>> On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner  
>> wrote:
>>> you can just use tr, ... if you supply the map.
>>
>> I don't know what "tr" is, but this comes back to one of my original
>> problems; and that is, I don't have a map. Does anyone know of a
>> publicly available map? Such a map very likely exists. For example,
>> Google Translate can translate from traditional to simplified. But
>> even if they use a map for this service, that map may be proprietary.
>>
>>> If you wish to do it on the fly in XeTeX, you can write a TECkit map.
>>> Having the TECkit map you can also run txtconv from the command line.
>>
>> I like these solutions. However, again, I would still need a map. SIL
>> has a collection of maps available here:
>>  http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&cat_id=ConversionMaps
>> But I didn't see a Chinese traditional-->simplified character map.
>>
>> Dan
>>
>>
>>
>>
>> On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner  
>> wrote:
>>> 2011/10/17 Daniel Greenhoe :
 I know that this is not really the right mailing list for this
 question, but I have so far not found the answer by any other means
 ...

 I would like to find or write some a utility that would take an
 unicode encoded file and map Chinese traditional characters to
 simplified, while leaving all other code points (such  as those in the
 Latin and IPA code spaces) untouched. For example, the traditional
 character for horse (馬) is at unicode U+99AC, the simplified one (马)
 is at unicode U+9A6C, and the Latin character for "A" is at U+0041. So
 I want a utility that would change the 99AC to 9A6C, but leave the
 0041 unchanged.

>>> If it is really that simple 1:1 mapping, you can just use tr, it does
>>> exactly that if you supply the map. If you wish to do it on the fly in
>>> XeTeX, you can write a TECkit map. Having the TECkit map you can also
>>> run txtconv from the command line.
>>>
 Does anyone know of such a utility? Does anyone know of any data base
 with a traditional to simplified character mapping such that I could
 maybe write the utility myself?

 Many thanks in advance,
 Dan



 --
 Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

>>>
>>>
>>>
>>> --
>>> Zdeněk Wagner
>>> http://hroch486.icpf.cas.cz/wagner/
>>> http://icebearsoft.euweb.cz
>>>
>>>
>>>
>>> --
>>> Subscriptions, Archive, and List information, etc.:
>>>  http://tug.org/mailman/listinfo/xetex
>>>
>>
>>
>>
>> --
>> Subscriptions, Archive, and List information, etc.:
>>  http://tug.org/mailman/listinfo/xetex
>>
>
>
>
> --
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex
>



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base

2011-10-18 Thread Andy Lin

You can try digging in the source for Tong Wen Tang (a Firefox
extension). Or email its developers. They should have a map and
additional notes on the conversion.

On Tue, Oct 18, 2011 at 18:50, Daniel Greenhoe  wrote:
> Hi Zdenek, Thank you for your suggestions.
>
> On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner  
> wrote:
>> you can just use tr, ... if you supply the map.
>
> I don't know what "tr" is, but this comes back to one of my original
> problems; and that is, I don't have a map. Does anyone know of a
> publicly available map? Such a map very likely exists. For example,
> Google Translate can translate from traditional to simplified. But
> even if they use a map for this service, that map may be proprietary.
>
>> If you wish to do it on the fly in XeTeX, you can write a TECkit map.
>> Having the TECkit map you can also run txtconv from the command line.
>
> I like these solutions. However, again, I would still need a map. SIL
> has a collection of maps available here:
>  http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&cat_id=ConversionMaps
> But I didn't see a Chinese traditional-->simplified character map.
>
> Dan
>
>
>
>
> On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner  
> wrote:
>> 2011/10/17 Daniel Greenhoe :
>>> I know that this is not really the right mailing list for this
>>> question, but I have so far not found the answer by any other means
>>> ...
>>>
>>> I would like to find or write some a utility that would take an
>>> unicode encoded file and map Chinese traditional characters to
>>> simplified, while leaving all other code points (such  as those in the
>>> Latin and IPA code spaces) untouched. For example, the traditional
>>> character for horse (馬) is at unicode U+99AC, the simplified one (马)
>>> is at unicode U+9A6C, and the Latin character for "A" is at U+0041. So
>>> I want a utility that would change the 99AC to 9A6C, but leave the
>>> 0041 unchanged.
>>>
>> If it is really that simple 1:1 mapping, you can just use tr, it does
>> exactly that if you supply the map. If you wish to do it on the fly in
>> XeTeX, you can write a TECkit map. Having the TECkit map you can also
>> run txtconv from the command line.
>>
>>> Does anyone know of such a utility? Does anyone know of any data base
>>> with a traditional to simplified character mapping such that I could
>>> maybe write the utility myself?
>>>
>>> Many thanks in advance,
>>> Dan
>>>
>>>
>>>
>>> --
>>> Subscriptions, Archive, and List information, etc.:
>>>  http://tug.org/mailman/listinfo/xetex
>>>
>>
>>
>>
>> --
>> Zdeněk Wagner
>> http://hroch486.icpf.cas.cz/wagner/
>> http://icebearsoft.euweb.cz
>>
>>
>>
>> --
>> Subscriptions, Archive, and List information, etc.:
>>  http://tug.org/mailman/listinfo/xetex
>>
>
>
>
> --
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex
>



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base

2011-10-18 Thread Daniel Greenhoe

Hi Zdenek, Thank you for your suggestions.

On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner  wrote:
> you can just use tr, ... if you supply the map.

I don't know what "tr" is, but this comes back to one of my original
problems; and that is, I don't have a map. Does anyone know of a
publicly available map? Such a map very likely exists. For example,
Google Translate can translate from traditional to simplified. But
even if they use a map for this service, that map may be proprietary.

> If you wish to do it on the fly in XeTeX, you can write a TECkit map.
> Having the TECkit map you can also run txtconv from the command line.

I like these solutions. However, again, I would still need a map. SIL
has a collection of maps available here:
  http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&cat_id=ConversionMaps
But I didn't see a Chinese traditional-->simplified character map.

Dan




On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner  wrote:
> 2011/10/17 Daniel Greenhoe :
>> I know that this is not really the right mailing list for this
>> question, but I have so far not found the answer by any other means
>> ...
>>
>> I would like to find or write some a utility that would take an
>> unicode encoded file and map Chinese traditional characters to
>> simplified, while leaving all other code points (such  as those in the
>> Latin and IPA code spaces) untouched. For example, the traditional
>> character for horse (馬) is at unicode U+99AC, the simplified one (马)
>> is at unicode U+9A6C, and the Latin character for "A" is at U+0041. So
>> I want a utility that would change the 99AC to 9A6C, but leave the
>> 0041 unchanged.
>>
> If it is really that simple 1:1 mapping, you can just use tr, it does
> exactly that if you supply the map. If you wish to do it on the fly in
> XeTeX, you can write a TECkit map. Having the TECkit map you can also
> run txtconv from the command line.
>
>> Does anyone know of such a utility? Does anyone know of any data base
>> with a traditional to simplified character mapping such that I could
>> maybe write the utility myself?
>>
>> Many thanks in advance,
>> Dan
>>
>>
>>
>> --
>> Subscriptions, Archive, and List information, etc.:
>>  http://tug.org/mailman/listinfo/xetex
>>
>
>
>
> --
> Zdeněk Wagner
> http://hroch486.icpf.cas.cz/wagner/
> http://icebearsoft.euweb.cz
>
>
>
> --
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex
>



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base

2011-10-17 Thread Zdenek Wagner

2011/10/17 Daniel Greenhoe :
> I know that this is not really the right mailing list for this
> question, but I have so far not found the answer by any other means
> ...
>
> I would like to find or write some a utility that would take an
> unicode encoded file and map Chinese traditional characters to
> simplified, while leaving all other code points (such  as those in the
> Latin and IPA code spaces) untouched. For example, the traditional
> character for horse (馬) is at unicode U+99AC, the simplified one (马)
> is at unicode U+9A6C, and the Latin character for "A" is at U+0041. So
> I want a utility that would change the 99AC to 9A6C, but leave the
> 0041 unchanged.
>
If it is really that simple 1:1 mapping, you can just use tr, it does
exactly that if you supply the map. If you wish to do it on the fly in
XeTeX, you can write a TECkit map. Having the TECkit map you can also
run txtconv from the command line.

> Does anyone know of such a utility? Does anyone know of any data base
> with a traditional to simplified character mapping such that I could
> maybe write the utility myself?
>
> Many thanks in advance,
> Dan
>
>
>
> --
> Subscriptions, Archive, and List information, etc.:
>  http://tug.org/mailman/listinfo/xetex
>



-- 
Zdeněk Wagner
http://hroch486.icpf.cas.cz/wagner/
http://icebearsoft.euweb.cz



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

[XeTeX] traditional to simplified Chinese character conversion utility or data base

2011-10-17 Thread Daniel Greenhoe

I know that this is not really the right mailing list for this
question, but I have so far not found the answer by any other means
...

I would like to find or write some a utility that would take an
unicode encoded file and map Chinese traditional characters to
simplified, while leaving all other code points (such  as those in the
Latin and IPA code spaces) untouched. For example, the traditional
character for horse (馬) is at unicode U+99AC, the simplified one (马)
is at unicode U+9A6C, and the Latin character for "A" is at U+0041. So
I want a utility that would change the 99AC to 9A6C, but leave the
0041 unchanged.

Does anyone know of such a utility? Does anyone know of any data base
with a traditional to simplified character mapping such that I could
maybe write the utility myself?

Many thanks in advance,
Dan



--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base

Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base

Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base

Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base

Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base

Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base

Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base

Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base

Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base

Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base

Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base

[XeTeX] traditional to simplified Chinese character conversion utility or data base

12 matches

Site Navigation

Mail list logo

Footer information