RE: How to handle searches across traditional and simplifies Chinese?

2011-03-08 Thread Burton-West, Tom
This page discusses the reasons why it's not a simple one to one mapping http://www.kanji.org/cjk/c2c/c2cbasis.htm Tom -Original Message- > I have documents that contain both simplified and traditional Chinese > characters. Is there any way to search across them? For example, if someone

Re: How to handle searches across traditional and simplifies Chinese?

2011-03-07 Thread Robert Muir
On Mon, Mar 7, 2011 at 7:01 PM, Andy wrote: > Thanks. Please tell me more about the tables/software that does the > conversion. Really appreciate your help. > also you might be interested in this example: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTransformFilterFacto

Re: How to handle searches across traditional and simplifies Chinese?

2011-03-07 Thread François Schiettecatte
> conversion. Really appreciate your help. > > > --- On Mon, 3/7/11, François Schiettecatte wrote: > >> From: François Schiettecatte >> Subject: Re: How to handle searches across traditional and simplifies >> Chinese? >> To: solr-user@lucene.apache.org >> Da

Re: How to handle searches across traditional and simplifies Chinese?

2011-03-07 Thread Andy
Thanks. Please tell me more about the tables/software that does the conversion. Really appreciate your help. --- On Mon, 3/7/11, François Schiettecatte wrote: > From: François Schiettecatte > Subject: Re: How to handle searches across traditional and simplifies Chinese? > To:

Re: How to handle searches across traditional and simplifies Chinese?

2011-03-07 Thread François Schiettecatte
I did a little research into this for a client a while. The character mapping is not one to one which complicates things (TC and SC have evolved independently) and if you want to do a perfect job you will need a dictionary. However there are tables out there (I can dig one up for you) that allow

How to handle searches across traditional and simplifies Chinese?

2011-03-07 Thread Andy
I have documents that contain both simplified and traditional Chinese characters. Is there any way to search across them? For example, if someone searches for 类 (simplified Chinese), I'd like to be able to recognize that the equivalent character is 類 in traditional Chinese and search for 类 or 類