Re: Exact phrase search on very large text

2015-06-26 Thread Upayavira
Why do you want to use the KeywordTokenizer? Why not use a text field, and use Solr's phrase search features? q=some phrase will match those terms next to each other, and should be fine with a large block of text. Combine that with hit highlighting, and it'll return a snippet of that block of

Re: Exact phrase search on very large text

2015-06-26 Thread Alessandro Benedetti
I agree with Updaya, furthermore It doesn't make any sense to try to solve a Phrase search problem , not tokenising at all the text … It's not going to work and it is fundamentally wrong to not tokenise long textual fields if you want to do free text search in them. Can you explain us better your

Re: Exact phrase search on very large text

2015-06-26 Thread Jack Krupansky
Lucene, the underlying search engine library, imposes this 32K limit for individual terms. Use tokenized text instead. -- Jack Krupansky On Thu, Jun 25, 2015 at 8:36 PM, Mike Thomsen mikerthom...@gmail.com wrote: I need to be able to do exact phrase searching on some documents that are a few

Re: Exact phrase search on very large text

2015-06-26 Thread Alessandro Benedetti
You are tokenising … tokenizer class=solr.WhitespaceTokenizerFactory/ Be careful in doing first the lowercase token filter. It's a best practice to first charFilter, then Tokenize and finally the set of Token Filters. Cheers 2015-06-26 13:27 GMT+01:00 Mike Thomsen mikerthom...@gmail.com: I

Re: Exact phrase search on very large text

2015-06-26 Thread Mike Thomsen
I tried creating a simplified new text field type that only did lower casing and exact phrasing worked this time. I'm not sure what the problem was. Perhaps it was a case of copypasta gone bad because I could have sworn that I tried exact phrase matching against a simple text field with bad

Exact phrase search on very large text

2015-06-25 Thread Mike Thomsen
I need to be able to do exact phrase searching on some documents that are a few hundred kb when treated as a single block of text. I'm on 4.10.4 and it complains when I try to put something larger than 32kb in using a textfield with the keyword tokenizer as the tokenizer. Is there any way I can