Re: tokenizer of solr

Mingfeng Yang Fri, 12 Apr 2013 09:18:12 -0700

Jack,

Thanks so much for this info.  It's awesome.


Ming


On Thu, Apr 11, 2013 at 7:32 PM, Jack Krupansky <j...@basetechnology.com>wrote:

> In that case, use the types="wdfftypes.txt" attribute of WDF and map "@"
> and "_" to ALPHA as shown in:
> http://wiki.apache.org/solr/**AnalyzersTokenizersTokenFilter**s#solr.**
> WordDelimiterFilterFactory<http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory>
> .
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Mingfeng Yang
> Sent: Thursday, April 11, 2013 8:50 PM
> To: solr-user@lucene.apache.org
> Subject: Re: tokenizer of solr
>
>
> looks like it's due to the word delimiter filter.  Anyone know if the
> "protected" file support regular expression or not?
>
> Ming
>
>
> On Thu, Apr 11, 2013 at 4:58 PM, Jack Krupansky <j...@basetechnology.com>*
> *wrote:
>
>  Try the whitespace tokenizer.
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Mingfeng Yang Sent: Thursday, April 11,
>> 2013 7:48 PM To: solr-user@lucene.apache.org Subject: tokenizer of solr
>> Dear Solr users and developers,
>>
>> I am trying to index some documents some of which are twitter messages,
>> and
>> we have a problem when indexing retweet.
>>
>> Say a twitter user named "jpc_108" post a tweet, and then someone retweet
>> his msg, and now @jpc_108 become part of the tweet text body.
>>
>> Seems like before indexing, the tokenizer factory of solr turns "@jpc_108"
>> into "jpc and 108", and when we search for jpc_108, it's not there
>> anymore.
>>
>>
>> Is there anyway we can keep "jcp_108" when it appears as "@jpc_108"?
>>
>> Thanks,
>> Ming-
>>
>>
>

Re: tokenizer of solr

Reply via email to