Re: Confuse with Kuromoji

2014-04-06 Thread Olivier Binda
On 04/06/2014 04:37 PM, Benson Margulies wrote: On Sun, Apr 6, 2014 at 10:30 AM, Herb Roitblat wrote: Just curious, what are some of the things that people do to properly tokenize the queries with mixed language collections? What do you do with mixed language queries? You can either force th

Re: Confuse with Kuromoji

2014-04-06 Thread Herb Roitblat
Thanks. These are familiar. Any other approaches that people use? I guess I'm hoping ... On 4/6/2014 7:37 AM, Benson Margulies wrote: On Sun, Apr 6, 2014 at 10:30 AM, Herb Roitblat wrote: Just curious, what are some of the things that people do to properly tokenize the queries with mixed la

Re: Confuse with Kuromoji

2014-04-06 Thread Benson Margulies
On Sun, Apr 6, 2014 at 10:30 AM, Herb Roitblat wrote: > Just curious, what are some of the things that people do to properly > tokenize the queries with mixed language collections? What do you do with > mixed language queries? > You can either force the user to tell you the language, or ...

Re: Confuse with Kuromoji

2014-04-06 Thread Herb Roitblat
Just curious, what are some of the things that people do to properly tokenize the queries with mixed language collections? What do you do with mixed language queries? On 4/6/2014 4:51 AM, Benson Margulies wrote: You must know what language each text is in, and use an appropriate analyzer. Som

Re: Confuse with Kuromoji

2014-04-06 Thread Benson Margulies
You must know what language each text is in, and use an appropriate analyzer. Some people do this by using a separate field (text_eng, text_spa, text_jpn). Other people put some extra information at the beginning of the field, and then make an analyzer that peeks in order to dispatch to the correct

Confuse with Kuromoji

2014-04-05 Thread j7a42e4fd7qux
I am pretty new with Lucene, however I have not problem understanding what is about. My big problem is trying to understand how Kuromoji works. I need to implement a search functinality thats supports initially English, Spanish and Japanese. I doesn't seem to be a deal with the two firsts, as I