Re: Partial word match using n-grams

2013-07-19 Thread Erick Erickson
Well, it depends on what you put between your tokenizer and ngram filter. Putting WordDelimiterFilterFactory would break up on the underscore (and lots of other things besides) and submit the separate tokens which would then be n-grammed separately. That has other implications, of course, but you g

RE: Searching for words begining with "or"

2013-07-19 Thread ABlaise
Thank you so much Jack !!! You helped me a great deal !! Thank you Allison too but I prefered to stay as close as possible to the default standard analyzer so I took Jack's array and retrieved the 'or'. I put it in the standardAnalyzer and it works as a charm ! Thanks to all of you ! -- View th

RE: Partial word match using n-grams

2013-07-19 Thread Becker, Thomas
Sorry, at indexing time it's not broken on anything. In other words quota_tommy yields these tokens: "quo uot ota ta_ a_t _to tom omm mmy" I've thought about trying to determine boundaries and breaking on them at indexing time, but that will require some more thought. It doesn't have to be an

Re: Partial word match using n-grams

2013-07-19 Thread Shai Erera
Wait, I didn't mean to pad the entire string. If the string is broken on _ already, then NGramFilter already receives the individual terms and you can put a Filter in front that will pass through a padded token? Shai On Fri, Jul 19, 2013 at 3:45 PM, Becker, Thomas wrote: > In general the data f

RE: Partial word match using n-grams

2013-07-19 Thread Becker, Thomas
In general the data for this field is that simple, but additional characters are allowed beyond [a-z_]. Do I need to tokenize on whitespace? I really don't know. Essentially, the question is whether we expect "quota tom" to match quota_tom or not. I spoke to some colleagues and they thought

RE: Searching for words begining with "or"

2013-07-19 Thread Allison, Timothy B.
If Jack's recommendation for keeping stopwords will work in your use case, this constructor should do the trick: Analyzer analyzer = new StandardAnalyzer(VERSION, CharArraySet.EMPTY_SET) From: Jack Krupansky [j...@basetechnology.com] Sent: Friday, July 19

RE: Partial word match using n-grams

2013-07-19 Thread Allison, Timothy B.
Got it...almost. Y. You're right. FuzzyQuery is not at all what you want. Don't know if your data is actually as simple as this example. Do you need to tokenize on whitespace? Would it make sense to replace spaces in the query with underscores and then trigramify the whole query as if it w