Here are a bunch of resources which will help:

This does TC <=> SC conversions:

        
http://search.cpan.org/~audreyt/Encode-HanConvert-0.35/lib/Encode/HanConvert.pm


This has a TC <=> SC converter in there somewhere:

        http://www.mediawiki.org/wiki/MediaWiki


This explains some of the issues behind TC <=> SC conversions:

        http://people.w3.org/rishida/scripts/chinese/


Misc tools:

        http://mandarintools.com/


François


On Mar 7, 2011, at 7:01 PM, Andy wrote:

> Thanks. Please tell me more about the tables/software that does the 
> conversion. Really appreciate your help.
> 
> 
> --- On Mon, 3/7/11, François Schiettecatte <fschietteca...@gmail.com> wrote:
> 
>> From: François Schiettecatte <fschietteca...@gmail.com>
>> Subject: Re: How to handle searches across traditional and simplifies 
>> Chinese?
>> To: solr-user@lucene.apache.org
>> Date: Monday, March 7, 2011, 5:24 PM
>> I did a little research into this for
>> a client a while. The character mapping is not one to one
>> which complicates things (TC and SC have evolved
>> independently) and if you want to do a perfect job you will
>> need a dictionary. However there are tables out there (I can
>> dig one up for you) that allow conversion from one to the
>> other. So you would pick either TC or SC as your canonical
>> Chinese, and just convert all the documents and searches to
>> it.
>> 
>> I will stress that this is very much a brute force
>> approach, the mapping is not perfect and the two character
>> sets have evolved (much like UK and US English, I was
>> brought up in the UK and live in the US).
>> 
>> Hope this helps.
>> 
>> Cheers
>> 
>> François
>> 
>> On Mar 7, 2011, at 5:02 PM, Andy wrote:
>> 
>>> I have documents that contain both simplified and
>> traditional Chinese characters. Is there any way to search
>> across them? For example, if someone searches for 类
>> (simplified Chinese), I'd like to be able to recognize that
>> the equivalent character is 類 in traditional Chinese and
>> search for 类 or 類 in the documents. 
>>> 
>>> Is that something that Solr, or any related software,
>> can do? Is there a standard approach in dealing with this
>> problem?
>>> 
>>> Thanks.
>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> 

Reply via email to