Sebastian, If I remember my regular expressions, that - and / are really just that. The stuff inside angle brackets means "any of the characters between [ and ]". - and / are just two of those characters, along with newline, space, comma, etc.
Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ ----- Original Message ---- > From: Sebastian M <mihais...@yahoo.com> > To: solr-user@lucene.apache.org > Sent: Tue, January 11, 2011 11:22:01 AM > Subject: default RegexFragmenter > > > Hello, > > I'm investigating an issue where spellcheck queries are tokenized without > being explicitly told to do so, resulting in suggestions such as > "www.www.product4sale.com.com" for the queries such as > "www.product4sale.com". > > The default RegexFragmenter fragmenter (name="regex") uses the regular > expression: > > [-\w ,/\n\"']{20,200} > > I understand parts of it, but I'm not sure about the - sign, or the slash > midway through it. > I would like to perhaps tailor this regular expression to not cause query > terms such as "www.product4sale.com" to be broken down on the period marks, > but just be kept as they are. > > Any suggestions or answers are highly appreciated! > > Sebastian > -- > View this message in context: >http://lucene.472066.n3.nabble.com/default-RegexFragmenter-tp2235106p2235106.html > > Sent from the Solr - User mailing list archive at Nabble.com. >