Try the whitespace tokenizer.

-- Jack Krupansky

-----Original Message----- From: Mingfeng Yang Sent: Thursday, April 11, 2013 7:48 PM To: solr-user@lucene.apache.org Subject: tokenizer of solr
Dear Solr users and developers,

I am trying to index some documents some of which are twitter messages, and
we have a problem when indexing retweet.

Say a twitter user named "jpc_108" post a tweet, and then someone retweet
his msg, and now @jpc_108 become part of the tweet text body.

Seems like before indexing, the tokenizer factory of solr turns "@jpc_108"
into "jpc and 108", and when we search for jpc_108, it's not there anymore.


Is there anyway we can keep "jcp_108" when it appears as "@jpc_108"?

Thanks,
Ming-

Reply via email to