Sebastian,

If I remember my regular expressions, that - and / are really just that.  The 
stuff inside angle brackets means "any of the characters between [ and ]".  - 
and / are just two of those characters, along with newline, space, comma, etc.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: Sebastian M <mihais...@yahoo.com>
> To: solr-user@lucene.apache.org
> Sent: Tue, January 11, 2011 11:22:01 AM
> Subject: default RegexFragmenter
> 
> 
> Hello,
> 
> I'm investigating an issue where spellcheck queries are  tokenized without
> being explicitly told to do so, resulting in suggestions  such as
> "www.www.product4sale.com.com" for the queries such  as
> "www.product4sale.com".
> 
> The default RegexFragmenter fragmenter  (name="regex") uses the regular
> expression:
> 
> [-\w  ,/\n\"']{20,200}
> 
> I understand parts of it, but I'm not sure about the -  sign, or the slash
> midway through it.
> I would like to perhaps tailor this  regular expression to not cause query
> terms such as "www.product4sale.com" to  be broken down on the period marks,
> but just be kept as they are.
> 
> Any  suggestions or answers are highly appreciated!
> 
> Sebastian
> -- 
> View  this message in context: 
>http://lucene.472066.n3.nabble.com/default-RegexFragmenter-tp2235106p2235106.html
>
> Sent  from the Solr - User mailing list archive at Nabble.com.
> 

Reply via email to