Well, it depends on what you put between your tokenizer and ngram
filter. Putting WordDelimiterFilterFactory would break up on the
underscore (and lots of other things besides) and submit the separate
tokens which would then be n-grammed separately. That has other
implications, of course, but you g
Thank you so much Jack !!! You helped me a great deal !!
Thank you Allison too but I prefered to stay as close as possible to the
default standard analyzer so I took Jack's array and retrieved the 'or'. I
put it in the standardAnalyzer and it works as a charm !
Thanks to all of you !
--
View th
Sorry, at indexing time it's not broken on anything. In other words
quota_tommy yields these tokens: "quo uot ota ta_ a_t _to tom omm mmy" I've
thought about trying to determine boundaries and breaking on them at indexing
time, but that will require some more thought. It doesn't have to be an
Wait, I didn't mean to pad the entire string. If the string is broken on _
already, then NGramFilter already receives the individual terms and you can
put a Filter in front that will pass through a padded token?
Shai
On Fri, Jul 19, 2013 at 3:45 PM, Becker, Thomas wrote:
> In general the data f
In general the data for this field is that simple, but additional characters
are allowed beyond [a-z_]. Do I need to tokenize on whitespace? I really
don't know. Essentially, the question is whether we expect "quota tom" to
match quota_tom or not. I spoke to some colleagues and they thought
If Jack's recommendation for keeping stopwords will work in your use case, this
constructor should do the trick:
Analyzer analyzer = new StandardAnalyzer(VERSION, CharArraySet.EMPTY_SET)
From: Jack Krupansky [j...@basetechnology.com]
Sent: Friday, July 19
Got it...almost.
Y. You're right. FuzzyQuery is not at all what you want.
Don't know if your data is actually as simple as this example. Do you need to
tokenize on whitespace? Would it make sense to replace spaces in the query
with underscores and then trigramify the whole query as if it w