looks like it's due to the word delimiter filter. Anyone know if the "protected" file support regular expression or not?
Ming On Thu, Apr 11, 2013 at 4:58 PM, Jack Krupansky <j...@basetechnology.com>wrote: > Try the whitespace tokenizer. > > -- Jack Krupansky > > -----Original Message----- From: Mingfeng Yang Sent: Thursday, April 11, > 2013 7:48 PM To: solr-user@lucene.apache.org Subject: tokenizer of solr > Dear Solr users and developers, > > I am trying to index some documents some of which are twitter messages, and > we have a problem when indexing retweet. > > Say a twitter user named "jpc_108" post a tweet, and then someone retweet > his msg, and now @jpc_108 become part of the tweet text body. > > Seems like before indexing, the tokenizer factory of solr turns "@jpc_108" > into "jpc and 108", and when we search for jpc_108, it's not there anymore. > > > Is there anyway we can keep "jcp_108" when it appears as "@jpc_108"? > > Thanks, > Ming- >