Re: Search across a specified number of boundaries

2013-01-15 Thread Mike Ree
Mikhail, Yeah, I considered that originally, but then after analyzing the data noticed that was not possible. Some of the content we analyze contains large tables that after ocr get turned into long running sentences which contain 500k+ words per a sentence. Overall there are probably around 10k

Re: Search across a specified number of boundaries

2013-01-14 Thread Mikhail Khludnev
Mike, When Lucene's Analyser indexes the text it adds positions into the index which are lately used by SpanQueries. Have you considered idea of position increment gap? e.g. the first sentence is indexed with words positions: 0,1,2,3,... the second sentence with 100,101,102,103,..., third