Mikhail,

Yeah, I considered that originally, but then after analyzing the data
noticed that was not possible. Some of the content we analyze contains
large tables that after ocr get turned into long running sentences which
contain 500k+ words per a sentence. Overall there are probably around 10k
of those anomalies that stop the ranges from working as we run out of
positions with the max value an integer can contain and run the risk of a
future document breaking it.

I found a Jira on what I'm looking for. Going to look into it and see if I
can get it to work for my situation.

https://issues.apache.org/jira/browse/LUCENE-777

Thanks for the help.

Mike

On Mon, Jan 14, 2013 at 11:48 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Mike,
>
> When Lucene's Analyser indexes the text it adds positions into the index
> which are lately used by SpanQueries. Have you considered idea of position
> increment gap? e.g. the first sentence is indexed with words positions:
> 0,1,2,3,... the second sentence with 100,101,102,103,..., third
> 200,201,202.. Then applying some span constraint allows you search
> across/inside of the sentences.
> WDYT?
>
>
> On Sun, Jan 6, 2013 at 6:50 PM, Erick Erickson <erickerick...@gmail.com>wrote:
>
>> Mike:
>>
>> I'm _really_ stretching here, but you might be able to do something
>> interesting
>>  with payloads. Say each word had a payload with the sentence number and
>> you _somehow_ made use of that information in a custom scorer. But like I
>> said, I really have no good idea how to accomplish that...
>>
>> BTW, in future this kind of question is better asked on the user's list
>> (either
>> Lucene or Solr), this list if intended for discussing development work....
>>
>> Best
>> Erick
>>
>>
>> On Fri, Jan 4, 2013 at 1:02 PM, Mike Ree <mike.ad...@olytech.net> wrote:
>>
>>> d terms that are in nearby sentences.
>>>
>>> IE:
>>> "TermA NEAR3 TermB" would find all TermA's that are within 3 sentences
>>> of TermB.
>>>
>>> Have found ways to find TermA within same sentence
>>>
>>
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mkhlud...@griddynamics.com>
>

Reply via email to