Hi Leonardo,
1. U can change the fieldtype to "string" in which case no tokenizers
will act on ur data and the content will be stored as is.
2. If u are using Solr 1.4 (latest) then there is a provision to mention
protected words for WordDelimiterFilterFactory which will take care of
your issue. 

-Kumar

-----Original Message-----
From: Leonardo Dias [mailto:leona...@catho.com.br] 
Sent: Thursday, March 26, 2009 6:53 PM
To: solr-user@lucene.apache.org
Subject: How to search for "C++"?

Hello there!

Currently we're having a problem in here and we're looking for some 
solutions. Right now we use the Standard Tokenizer to separate tokens 
and we just found out that we cannot search for "c++" in our index 
because it is not considered a word.

Since we need this search to work properly (including a search for C#) 
we'd like to know what are you guys doing when people search for words 
that have symbols, like these programming languages. I thought there 
could be a list of "protected words" in the standard tokenizer, so that 
we could protect these tokens. Another possibility would be using the 
Pattern Tokenizer, but it seems it is kinda slow when it comes to index 
a huge amount of data, which is our case.

What do you think the best solution would be?

Best,

Leonardo

-- 


Reply via email to