Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base
On 2011-10-22 08:11, Zdenek Wagner wrote: On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner wrote: >> If you wish to do it on the fly in XeTeX, you can write a TECkit map. > > I do have a map now. Can someone tell me how to do the conversion "on > the fly" in XeLaTeX? I did see the command line option > "-translate-file=TCXNAME", but for that it says "(ignored)". > \usepackage{fontspec} \setmainfont[Mapping=mapname]{fontname} or \fontspec[Mapping=mapname]{fontname} or \newfontfamily\[Mapping=mapname]{System Family Name} or \newfontface\[Mapping=mapname]{System Family Name Bold} where bold can be Bold/Italic/Bold Italic/Normal, if only part of the document should use the map or the font. see section I.6 of the fontspec documentation. /bpj -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base
On Sat, Oct 22, 2011 at 2:11 PM, Zdenek Wagner wrote: > \addfontfeatures{Mapping=mapname}. Amazing. I tested it. It works. Amazing. Thanks! It's a really great solution. Dan On Sat, Oct 22, 2011 at 2:11 PM, Zdenek Wagner wrote: > 2011/10/22 Daniel Greenhoe : >> On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner >> wrote: >>> If you wish to do it on the fly in XeTeX, you can write a TECkit map. >> >> I do have a map now. Can someone tell me how to do the conversion "on >> the fly" in XeLaTeX? I did see the command line option >> "-translate-file=TCXNAME", but for that it says "(ignored)". >> > \usepackage{fontspec} > \setmainfont[Mapping=mapname]{fontname} > > or > > \fontspec[Mapping=mapname]{fontname} > > TCX tables are used in pdftex and the table is used for the whole > document (and cannot be changed). TECkit map is applied in XeTeX per > font and can even be replaced (eg in a group) by > \addfontfeatures{Mapping=mapname}. > >> Dan >> >> >> On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner >> wrote: >>> 2011/10/17 Daniel Greenhoe : I know that this is not really the right mailing list for this question, but I have so far not found the answer by any other means ... I would like to find or write some a utility that would take an unicode encoded file and map Chinese traditional characters to simplified, while leaving all other code points (such as those in the Latin and IPA code spaces) untouched. For example, the traditional character for horse (馬) is at unicode U+99AC, the simplified one (马) is at unicode U+9A6C, and the Latin character for "A" is at U+0041. So I want a utility that would change the 99AC to 9A6C, but leave the 0041 unchanged. >>> If it is really that simple 1:1 mapping, you can just use tr, it does >>> exactly that if you supply the map. If you wish to do it on the fly in >>> XeTeX, you can write a TECkit map. Having the TECkit map you can also >>> run txtconv from the command line. >>> Does anyone know of such a utility? Does anyone know of any data base with a traditional to simplified character mapping such that I could maybe write the utility myself? Many thanks in advance, Dan -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex >>> >>> >>> >>> -- >>> Zdeněk Wagner >>> http://hroch486.icpf.cas.cz/wagner/ >>> http://icebearsoft.euweb.cz >>> >>> >>> >>> -- >>> Subscriptions, Archive, and List information, etc.: >>> http://tug.org/mailman/listinfo/xetex >>> >> >> >> >> -- >> Subscriptions, Archive, and List information, etc.: >> http://tug.org/mailman/listinfo/xetex >> > > > > -- > Zdeněk Wagner > http://hroch486.icpf.cas.cz/wagner/ > http://icebearsoft.euweb.cz > > > > -- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base
2011/10/22 Daniel Greenhoe : > On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner > wrote: >> If you wish to do it on the fly in XeTeX, you can write a TECkit map. > > I do have a map now. Can someone tell me how to do the conversion "on > the fly" in XeLaTeX? I did see the command line option > "-translate-file=TCXNAME", but for that it says "(ignored)". > \usepackage{fontspec} \setmainfont[Mapping=mapname]{fontname} or \fontspec[Mapping=mapname]{fontname} TCX tables are used in pdftex and the table is used for the whole document (and cannot be changed). TECkit map is applied in XeTeX per font and can even be replaced (eg in a group) by \addfontfeatures{Mapping=mapname}. > Dan > > > On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner > wrote: >> 2011/10/17 Daniel Greenhoe : >>> I know that this is not really the right mailing list for this >>> question, but I have so far not found the answer by any other means >>> ... >>> >>> I would like to find or write some a utility that would take an >>> unicode encoded file and map Chinese traditional characters to >>> simplified, while leaving all other code points (such as those in the >>> Latin and IPA code spaces) untouched. For example, the traditional >>> character for horse (馬) is at unicode U+99AC, the simplified one (马) >>> is at unicode U+9A6C, and the Latin character for "A" is at U+0041. So >>> I want a utility that would change the 99AC to 9A6C, but leave the >>> 0041 unchanged. >>> >> If it is really that simple 1:1 mapping, you can just use tr, it does >> exactly that if you supply the map. If you wish to do it on the fly in >> XeTeX, you can write a TECkit map. Having the TECkit map you can also >> run txtconv from the command line. >> >>> Does anyone know of such a utility? Does anyone know of any data base >>> with a traditional to simplified character mapping such that I could >>> maybe write the utility myself? >>> >>> Many thanks in advance, >>> Dan >>> >>> >>> >>> -- >>> Subscriptions, Archive, and List information, etc.: >>> http://tug.org/mailman/listinfo/xetex >>> >> >> >> >> -- >> Zdeněk Wagner >> http://hroch486.icpf.cas.cz/wagner/ >> http://icebearsoft.euweb.cz >> >> >> >> -- >> Subscriptions, Archive, and List information, etc.: >> http://tug.org/mailman/listinfo/xetex >> > > > > -- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > -- Zdeněk Wagner http://hroch486.icpf.cas.cz/wagner/ http://icebearsoft.euweb.cz -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base
On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner wrote: > If you wish to do it on the fly in XeTeX, you can write a TECkit map. I do have a map now. Can someone tell me how to do the conversion "on the fly" in XeLaTeX? I did see the command line option "-translate-file=TCXNAME", but for that it says "(ignored)". Dan On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner wrote: > 2011/10/17 Daniel Greenhoe : >> I know that this is not really the right mailing list for this >> question, but I have so far not found the answer by any other means >> ... >> >> I would like to find or write some a utility that would take an >> unicode encoded file and map Chinese traditional characters to >> simplified, while leaving all other code points (such as those in the >> Latin and IPA code spaces) untouched. For example, the traditional >> character for horse (馬) is at unicode U+99AC, the simplified one (马) >> is at unicode U+9A6C, and the Latin character for "A" is at U+0041. So >> I want a utility that would change the 99AC to 9A6C, but leave the >> 0041 unchanged. >> > If it is really that simple 1:1 mapping, you can just use tr, it does > exactly that if you supply the map. If you wish to do it on the fly in > XeTeX, you can write a TECkit map. Having the TECkit map you can also > run txtconv from the command line. > >> Does anyone know of such a utility? Does anyone know of any data base >> with a traditional to simplified character mapping such that I could >> maybe write the utility myself? >> >> Many thanks in advance, >> Dan >> >> >> >> -- >> Subscriptions, Archive, and List information, etc.: >> http://tug.org/mailman/listinfo/xetex >> > > > > -- > Zdeněk Wagner > http://hroch486.icpf.cas.cz/wagner/ > http://icebearsoft.euweb.cz > > > > -- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base
I seem to have a working solution now. Yesterday I wrote a c program to convert the Unihan_variants.txt file (suggested by Arthur) to an ascii TECkit (suggested by Zdenek) map, then used TECkit's teckit_compile utility to convert that to a binary map, and then used TECkit's txtconv utility (also suggested by Zdenek) to map the traditional characters to simplified. The map files contain 12,730 unicode to unicode mapping relations each. More testing would definitely be good (no guarantees at this point). If anyone has interest, they can download this zip file: http://banyan.cm.nctu.edu.tw/~dgreenhoe/groups/var2map.zip The zip file includes the c source code, makefile, mapping file, and tec file, as well as a Windows executable. The included tec file is based on the Unicode 6.1.0 standard. If a new standard becomes available, var2map.exe and teckit_complile.exe can be run again to update the binary mapping file. Using make, you can change the directory paths in the makefile and enter "make all" on the command line for a kind of demo. The demo maps some Latin and traditional characters (in trad.tex) to Latin and simplified characters (in simp.tex). On Thu, Oct 20, 2011 at 11:47 PM, BPJ wrote: > I got the thought that this might be done at least approximatively by ... > $ grep 'kSimplifiedVariant' Unihan_Variants.txt \ > |perl -ple's/kSimplifiedVariant/>/' >>tex-chi-sim-trad.map > tex-text.map, plus some very little manual touching up > of debris after a comment line in Unihan_Variants.txt and > adding some descriptive comments. It looks like this solution from BPJ does essentially the same thing as the above mentioned c program. In addition, this solution by BPJ has the additional benefit, because it is a perl script, of being cross-platform without having to run a c compiler. As a follow-up to Andy's suggestion of the Tong Wen code: I did look into the code. I found what appears might be a good set of data bases for the simplified to traditional conversion, but I didn't seem to find a traditional to simplified solution. I did join a mailing list for the project and posted a request for assistance, but so far have not received any reply. Maybe the project has become dormant. Thank you very much to everyone who gave me help on this --- Zdenek for the TECnik suggestion, Andy for the Tong Wen suggestion, Arthur for the Unihan_Variants suggestion, and BPJ for the perl suggestion. I appreciate the help very much --- I don't know if I would have ever arrived at a solution without it. One of the next tasks is to find quality fonts (preferably OpenType) for Simplified Chinese, including fonts with Ruby text (Zhu-Yin or Pin-Yin). If anyone has suggestions of useful font repositories, please let me know. Thanks! Dan On Thu, Oct 20, 2011 at 11:47 PM, BPJ wrote: > I got the thought that this might be done at least > approximatively by simply running the the following > command in the terminal: > > $ grep 'kSimplifiedVariant' Unihan_Variants.txt \ > |perl -ple's/kSimplifiedVariant/>/' >>tex-chi-sim-trad.map > > where Unihan_Variants.txt is the file from the Unicode > Unihan database and tex-chi-sim-trad.map is a copy of > tex-text.map, plus some very little manual touching up > of debris after a comment line in Unihan_Variants.txt and > adding some descriptive comments. The results are attached. > > /bpj > > On 2011-10-20 00:44, Daniel Greenhoe wrote: >> >> Hi Arthur, >> >> On Thu, Oct 20, 2011 at 1:02 AM, Arthur Reutenauer >> wrote: >>> >>> Unicode has that in the Unihan database: >>> look up Unihan_Variants.txt in Unihan.zip >>> (latest version >>> http://www.unicode.org/Public/6.1.0/ucd/Unihan-6.1.0d1.zip ) >> >> It looks like I can extract everything I need from Unihan_Variants.txt. >> Thank you so much for your help! I appreciate it very much. >> >> Dan >> >> On Thu, Oct 20, 2011 at 1:02 AM, Arthur Reutenauer >> wrote: >>> >>> On Tue, Oct 18, 2011 at 05:49:28AM +0800, Daniel Greenhoe wrote: Does anyone know of any data base with a traditional to simplified character mapping such that I could maybe write the utility myself? >>> >>> Unicode has that in the Unihan database: look up Unihan_Variants.txt >>> in Unihan.zip (latest version >>> http://www.unicode.org/Public/6.1.0/ucd/Unihan-6.1.0d1.zip ) >>> >>> Arthur >>> >>> >>> -- >>> Subscriptions, Archive, and List information, etc.: >>> http://tug.org/mailman/listinfo/xetex >>> >> >> >> >> -- >> Subscriptions, Archive, and List information, etc.: >> http://tug.org/mailman/listinfo/xetex > > > > > -- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > > -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/l
Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base
Hi Arthur, On Thu, Oct 20, 2011 at 1:02 AM, Arthur Reutenauer wrote: > Unicode has that in the Unihan database: > look up Unihan_Variants.txt in Unihan.zip > (latest version http://www.unicode.org/Public/6.1.0/ucd/Unihan-6.1.0d1.zip ) It looks like I can extract everything I need from Unihan_Variants.txt. Thank you so much for your help! I appreciate it very much. Dan On Thu, Oct 20, 2011 at 1:02 AM, Arthur Reutenauer wrote: > On Tue, Oct 18, 2011 at 05:49:28AM +0800, Daniel Greenhoe wrote: >> Does anyone know of any data base >> with a traditional to simplified character mapping such that I could >> maybe write the utility myself? > > Unicode has that in the Unihan database: look up Unihan_Variants.txt > in Unihan.zip (latest version > http://www.unicode.org/Public/6.1.0/ucd/Unihan-6.1.0d1.zip ) > > Arthur > > > -- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base
On Tue, Oct 18, 2011 at 05:49:28AM +0800, Daniel Greenhoe wrote: > Does anyone know of any data base > with a traditional to simplified character mapping such that I could > maybe write the utility myself? Unicode has that in the Unihan database: look up Unihan_Variants.txt in Unihan.zip (latest version http://www.unicode.org/Public/6.1.0/ucd/Unihan-6.1.0d1.zip ) Arthur -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base
On Wed, Oct 19, 2011 at 10:05 AM, Andy Lin wrote: > You can try digging in the source for Tong Wen Tang ... Or email its > developers. That's a great idea --- thanks! Dan On Wed, Oct 19, 2011 at 10:05 AM, Andy Lin wrote: > You can try digging in the source for Tong Wen Tang (a Firefox > extension). Or email its developers. They should have a map and > additional notes on the conversion. > > On Tue, Oct 18, 2011 at 18:50, Daniel Greenhoe wrote: >> Hi Zdenek, Thank you for your suggestions. >> >> On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner >> wrote: >>> you can just use tr, ... if you supply the map. >> >> I don't know what "tr" is, but this comes back to one of my original >> problems; and that is, I don't have a map. Does anyone know of a >> publicly available map? Such a map very likely exists. For example, >> Google Translate can translate from traditional to simplified. But >> even if they use a map for this service, that map may be proprietary. >> >>> If you wish to do it on the fly in XeTeX, you can write a TECkit map. >>> Having the TECkit map you can also run txtconv from the command line. >> >> I like these solutions. However, again, I would still need a map. SIL >> has a collection of maps available here: >> http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&cat_id=ConversionMaps >> But I didn't see a Chinese traditional-->simplified character map. >> >> Dan >> >> >> >> >> On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner >> wrote: >>> 2011/10/17 Daniel Greenhoe : I know that this is not really the right mailing list for this question, but I have so far not found the answer by any other means ... I would like to find or write some a utility that would take an unicode encoded file and map Chinese traditional characters to simplified, while leaving all other code points (such as those in the Latin and IPA code spaces) untouched. For example, the traditional character for horse (馬) is at unicode U+99AC, the simplified one (马) is at unicode U+9A6C, and the Latin character for "A" is at U+0041. So I want a utility that would change the 99AC to 9A6C, but leave the 0041 unchanged. >>> If it is really that simple 1:1 mapping, you can just use tr, it does >>> exactly that if you supply the map. If you wish to do it on the fly in >>> XeTeX, you can write a TECkit map. Having the TECkit map you can also >>> run txtconv from the command line. >>> Does anyone know of such a utility? Does anyone know of any data base with a traditional to simplified character mapping such that I could maybe write the utility myself? Many thanks in advance, Dan -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex >>> >>> >>> >>> -- >>> Zdeněk Wagner >>> http://hroch486.icpf.cas.cz/wagner/ >>> http://icebearsoft.euweb.cz >>> >>> >>> >>> -- >>> Subscriptions, Archive, and List information, etc.: >>> http://tug.org/mailman/listinfo/xetex >>> >> >> >> >> -- >> Subscriptions, Archive, and List information, etc.: >> http://tug.org/mailman/listinfo/xetex >> > > > > -- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base
You can try digging in the source for Tong Wen Tang (a Firefox extension). Or email its developers. They should have a map and additional notes on the conversion. On Tue, Oct 18, 2011 at 18:50, Daniel Greenhoe wrote: > Hi Zdenek, Thank you for your suggestions. > > On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner > wrote: >> you can just use tr, ... if you supply the map. > > I don't know what "tr" is, but this comes back to one of my original > problems; and that is, I don't have a map. Does anyone know of a > publicly available map? Such a map very likely exists. For example, > Google Translate can translate from traditional to simplified. But > even if they use a map for this service, that map may be proprietary. > >> If you wish to do it on the fly in XeTeX, you can write a TECkit map. >> Having the TECkit map you can also run txtconv from the command line. > > I like these solutions. However, again, I would still need a map. SIL > has a collection of maps available here: > http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&cat_id=ConversionMaps > But I didn't see a Chinese traditional-->simplified character map. > > Dan > > > > > On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner > wrote: >> 2011/10/17 Daniel Greenhoe : >>> I know that this is not really the right mailing list for this >>> question, but I have so far not found the answer by any other means >>> ... >>> >>> I would like to find or write some a utility that would take an >>> unicode encoded file and map Chinese traditional characters to >>> simplified, while leaving all other code points (such as those in the >>> Latin and IPA code spaces) untouched. For example, the traditional >>> character for horse (馬) is at unicode U+99AC, the simplified one (马) >>> is at unicode U+9A6C, and the Latin character for "A" is at U+0041. So >>> I want a utility that would change the 99AC to 9A6C, but leave the >>> 0041 unchanged. >>> >> If it is really that simple 1:1 mapping, you can just use tr, it does >> exactly that if you supply the map. If you wish to do it on the fly in >> XeTeX, you can write a TECkit map. Having the TECkit map you can also >> run txtconv from the command line. >> >>> Does anyone know of such a utility? Does anyone know of any data base >>> with a traditional to simplified character mapping such that I could >>> maybe write the utility myself? >>> >>> Many thanks in advance, >>> Dan >>> >>> >>> >>> -- >>> Subscriptions, Archive, and List information, etc.: >>> http://tug.org/mailman/listinfo/xetex >>> >> >> >> >> -- >> Zdeněk Wagner >> http://hroch486.icpf.cas.cz/wagner/ >> http://icebearsoft.euweb.cz >> >> >> >> -- >> Subscriptions, Archive, and List information, etc.: >> http://tug.org/mailman/listinfo/xetex >> > > > > -- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base
Hi Zdenek, Thank you for your suggestions. On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner wrote: > you can just use tr, ... if you supply the map. I don't know what "tr" is, but this comes back to one of my original problems; and that is, I don't have a map. Does anyone know of a publicly available map? Such a map very likely exists. For example, Google Translate can translate from traditional to simplified. But even if they use a map for this service, that map may be proprietary. > If you wish to do it on the fly in XeTeX, you can write a TECkit map. > Having the TECkit map you can also run txtconv from the command line. I like these solutions. However, again, I would still need a map. SIL has a collection of maps available here: http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&cat_id=ConversionMaps But I didn't see a Chinese traditional-->simplified character map. Dan On Tue, Oct 18, 2011 at 2:53 PM, Zdenek Wagner wrote: > 2011/10/17 Daniel Greenhoe : >> I know that this is not really the right mailing list for this >> question, but I have so far not found the answer by any other means >> ... >> >> I would like to find or write some a utility that would take an >> unicode encoded file and map Chinese traditional characters to >> simplified, while leaving all other code points (such as those in the >> Latin and IPA code spaces) untouched. For example, the traditional >> character for horse (馬) is at unicode U+99AC, the simplified one (马) >> is at unicode U+9A6C, and the Latin character for "A" is at U+0041. So >> I want a utility that would change the 99AC to 9A6C, but leave the >> 0041 unchanged. >> > If it is really that simple 1:1 mapping, you can just use tr, it does > exactly that if you supply the map. If you wish to do it on the fly in > XeTeX, you can write a TECkit map. Having the TECkit map you can also > run txtconv from the command line. > >> Does anyone know of such a utility? Does anyone know of any data base >> with a traditional to simplified character mapping such that I could >> maybe write the utility myself? >> >> Many thanks in advance, >> Dan >> >> >> >> -- >> Subscriptions, Archive, and List information, etc.: >> http://tug.org/mailman/listinfo/xetex >> > > > > -- > Zdeněk Wagner > http://hroch486.icpf.cas.cz/wagner/ > http://icebearsoft.euweb.cz > > > > -- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
Re: [XeTeX] traditional to simplified Chinese character conversion utility or data base
2011/10/17 Daniel Greenhoe : > I know that this is not really the right mailing list for this > question, but I have so far not found the answer by any other means > ... > > I would like to find or write some a utility that would take an > unicode encoded file and map Chinese traditional characters to > simplified, while leaving all other code points (such as those in the > Latin and IPA code spaces) untouched. For example, the traditional > character for horse (馬) is at unicode U+99AC, the simplified one (马) > is at unicode U+9A6C, and the Latin character for "A" is at U+0041. So > I want a utility that would change the 99AC to 9A6C, but leave the > 0041 unchanged. > If it is really that simple 1:1 mapping, you can just use tr, it does exactly that if you supply the map. If you wish to do it on the fly in XeTeX, you can write a TECkit map. Having the TECkit map you can also run txtconv from the command line. > Does anyone know of such a utility? Does anyone know of any data base > with a traditional to simplified character mapping such that I could > maybe write the utility myself? > > Many thanks in advance, > Dan > > > > -- > Subscriptions, Archive, and List information, etc.: > http://tug.org/mailman/listinfo/xetex > -- Zdeněk Wagner http://hroch486.icpf.cas.cz/wagner/ http://icebearsoft.euweb.cz -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex
[XeTeX] traditional to simplified Chinese character conversion utility or data base
I know that this is not really the right mailing list for this question, but I have so far not found the answer by any other means ... I would like to find or write some a utility that would take an unicode encoded file and map Chinese traditional characters to simplified, while leaving all other code points (such as those in the Latin and IPA code spaces) untouched. For example, the traditional character for horse (馬) is at unicode U+99AC, the simplified one (马) is at unicode U+9A6C, and the Latin character for "A" is at U+0041. So I want a utility that would change the 99AC to 9A6C, but leave the 0041 unchanged. Does anyone know of such a utility? Does anyone know of any data base with a traditional to simplified character mapping such that I could maybe write the utility myself? Many thanks in advance, Dan -- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex