recovering payload from fields

2010-02-26 Thread Christopher Condit
I'm trying to store semantic information in payloads at index time. I believe this part is successful - but I'm having trouble getting access to the payload locations after the index is created. I'd like to know the offset in the original text for the token with the payload - and get this inform

Re: recovering payload from fields

2010-02-26 Thread Christopher Tignor
Hello, To my knoweldge, the character position of the tokens is not preserved by Lucene - only the ordinal postion of token's within a document / field is preserved. Thus you need to store this character offset information separately, say, as Payload data. best, C>T> On Fri, Feb 26, 2010 at 3:

RE: recovering payload from fields

2010-02-26 Thread Christopher Condit
Hi Chris- > To my knoweldge, the character position of the tokens is not preserved by > Lucene - only the ordinal postion of token's within a document / field is > preserved. Thus you need to store this character offset information > separately, say, as Payload data. Thanks for the information. S

RE: recovering payload from fields

2010-02-26 Thread Christopher Condit
> Payload Data is accessed through PayloadSpans so using SpanQUeries is the > netry point it seems. There are tools like PayloadSpanUtil that convert other > queries into SpanQueries for this purpose if needed but the api for Payloads > looks it like it goes through Spans is the bottom line. So t

Re: recovering payload from fields

2010-02-27 Thread Michael McCandless
You can also access payloads through the TermPositions enum, but, this is by term and then by doc. It sounds like you need to iterate through all terms sequentially in a given field in the doc, accessing offset & payload? In which case reanalyzing at search time may be the best way to go. You ca

RE: recovering payload from fields

2010-02-27 Thread Christopher Condit
> It sounds like you need to iterate through all terms sequentially in a given > field in the doc, accessing offset & payload? In which case reanalyzing at > search time may be the best way to go. If it matters it doesn't need to be sequential. I just need access to all the payloads for a given

Re: recovering payload from fields

2010-03-05 Thread Christopher Tignor
What I'd ideally like to do is to take SpanQuery, loop over the PayloadSpans returned from SpanQuery.getPayloadSpans() and store all PayloadSpans for a given document in a Map by their doc id. Then later after deciding in memory which documents I need, load the Payload data for just those PayloadS

Re: recovering payload from fields

2010-03-05 Thread Grant Ingersoll
It's not implemented, but http://issues.apache.org/jira/browse/LUCENE-1888 is how I would solve it. It probably isn't that hard to implement, actually. A patch would be great. Happy to review one. On Feb 27, 2010, at 5:29 PM, Christopher Condit wrote: >> It sounds like you need to iterate t