What I'd ideally like to do is to take SpanQuery, loop over the PayloadSpans
returned from SpanQuery.getPayloadSpans() and store all PayloadSpans for a
given document in a Map by their doc id.
Then later after deciding in memory which documents I need, load the Payload
data for just those
It's not implemented, but http://issues.apache.org/jira/browse/LUCENE-1888 is
how I would solve it. It probably isn't that hard to implement, actually. A
patch would be great. Happy to review one.
On Feb 27, 2010, at 5:29 PM, Christopher Condit wrote:
It sounds like you need to iterate
You can also access payloads through the TermPositions enum, but, this
is by term and then by doc.
It sounds like you need to iterate through all terms sequentially in a
given field in the doc, accessing offset payload? In which case
reanalyzing at search time may be the best way to go.
You
It sounds like you need to iterate through all terms sequentially in a given
field in the doc, accessing offset payload? In which case reanalyzing at
search time may be the best way to go.
If it matters it doesn't need to be sequential. I just need access to all the
payloads for a given doc
Hello,
To my knoweldge, the character position of the tokens is not preserved by
Lucene - only the ordinal postion of token's within a document / field is
preserved. Thus you need to store this character offset information
separately, say, as Payload data.
best,
CT
On Fri, Feb 26, 2010 at
Hi Chris-
To my knoweldge, the character position of the tokens is not preserved by
Lucene - only the ordinal postion of token's within a document / field is
preserved. Thus you need to store this character offset information
separately, say, as Payload data.
Thanks for the information. So
Payload Data is accessed through PayloadSpans so using SpanQUeries is the
netry point it seems. There are tools like PayloadSpanUtil that convert other
queries into SpanQueries for this purpose if needed but the api for Payloads
looks it like it goes through Spans is the bottom line.
So