Re: Payloads

2008-12-27 Thread Grant Ingersoll
Very cool stuff Karl. Would love to see some TREC-style evaluations for the ShingleMatrixQuery stuff just to see some comparisons. Also, you might have a look at the new TokenStream stuff that is in 2.9 and is a start on it's way towards Flexible Indexing. I think this may actually allow

Re: Payloads

2008-12-29 Thread Peter Keegan
Hi Karl, I use payloads for weight only, too, with BoostingTermQuery (see: http://www.nabble.com/BoostingTermQuery-scoring-td20323615.html#a20323615) A custom tokenizer looks for the reserved character '\b' followed by a 2 byte 'boost' value. It then creates a special Token type for a custom filt

Re: Payloads

2009-12-19 Thread AHMET ARSLAN
> Hi, > > I need to add a query operator '!' such that when it > precedes a word or a > phrase in the query, that term will contribute twice its > weight if it is > positioned in an even offset of the document. The position > of a phrase is > determined by the offset of its first word. > > I gue

RE: Payloads

2009-12-19 Thread Elias Khsheibun
I want to override the operator - it is for a project purpose. -Original Message- From: AHMET ARSLAN [mailto:iori...@yahoo.com] Sent: Saturday, December 19, 2009 6:41 PM To: java-user@lucene.apache.org Subject: Re: Payloads > Hi, > > I need to add a query operator '!'

RE: Payloads

2009-12-19 Thread AHMET ARSLAN
> I want to override the operator - it > is for a project purpose. Can you explain your requirements more? What do you mean by "an even offset of the document"? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apa

RE: Payloads

2009-12-19 Thread Elias Khsheibun
is even) - we apply this doubling of weight only if a '!' operator precedes the term and if its offset from the document is even. -Original Message- From: AHMET ARSLAN [mailto:iori...@yahoo.com] Sent: Saturday, December 19, 2009 6:48 PM To: java-user@lucene.apache.org Subject:

RE: Payloads

2009-12-19 Thread Uwe Schindler
> Sent: Saturday, December 19, 2009 5:54 PM > To: java-user@lucene.apache.org > Subject: RE: Payloads > > Let's say I have a document that contains the following text: > > "Graph Algorithms is one of the most important topics in computer science" > > And

RE: Payloads

2009-12-19 Thread Elias Khsheibun
About 60 students I think, if you have given some answers I would be grateful if you could link me to them or quote them again. -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Saturday, December 19, 2009 7:00 PM To: java-user@lucene.apache.org Subject: RE: Payloads

RE: Payloads

2009-12-19 Thread AHMET ARSLAN
> Let's say I have a document that > contains the following text: > > "Graph Algorithms is one of the most important topics in > computer science" > > And a query "!Graph Algorithms" then the term Graph in the > query should have > a double weight because the offset of Graph is 0 (and it is > ev

RE: Payloads

2009-12-19 Thread Elias Khsheibun
oaded ? -Original Message- From: AHMET ARSLAN [mailto:iori...@yahoo.com] Sent: Saturday, December 19, 2009 8:34 PM To: java-user@lucene.apache.org Subject: RE: Payloads > Let's say I have a document that > contains the following text: > > "Graph Algorithms is one of t

RE: Payloads

2009-12-19 Thread AHMET ARSLAN
> If I need to override the QueryParser > to return PayloadTermQuery, what > function for PayloadFunction should I use in the > constructor (If you can > show me an example). I am not sure about that. Maybe custom one. > In your code I didn't see an indexer, will this work with > the regular > I

RE: Payloads

2009-12-19 Thread Elias Khsheibun
What do you mean by a custom one - please explain. I must use a PayloadTermQuery ? And for the TermPositionPayloadTokenFilter there is a method that is not used - incrementToken (only used in the main method) ... I didn't see in the code the place that examines if the query term is at an even offs

RE: Payloads

2009-12-20 Thread Elias Khsheibun
hits2 = searcher.search(query2, 10).scoreDocs; for (int i = 0; i < hits2.length; i++) { Document hitDoc = searcher.doc(hits2[i].doc); System.out.println(hitDoc.get("title")); } } } -Original Message- From: AHMET ARSLAN [mail

RE: Payloads

2009-12-20 Thread Uwe Schindler
.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Elias Khsheibun [mailto:eli...@gmail.com] > Sent: Sunday, December 20, 2009 2:51 PM > To: java-user@lucene.apache.org > Subject: RE: Payloads > > > I'm t

RE: Payloads

2009-12-21 Thread Elias Khsheibun
er@lucene.apache.org Subject: RE: Payloads > Let's say I have a document that > contains the following text: > > "Graph Algorithms is one of the most important topics in computer > science" > > And a query "!Graph Algorithms" then the term Graph in the q

Re: Re: Payloads

2008-12-27 Thread tom
AUTOMATIC REPLY LUX is closed until 5th January 2009 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Re: Re: Payloads

2008-12-27 Thread tom
AUTOMATIC REPLY LUX is closed until 5th January 2009 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Re: Re: Payloads

2008-12-29 Thread Greg Shackles
That sounds pretty cool Karl, and I also dig your use of Motorhead as an example : ) I recently built an application where payloads were a lifesaver, but my usage of them is pretty basic. I am indexing pages of text, so I use payloads to store metadata about each word on the page - size, color, r

Re: Re: Re: Payloads

2008-12-29 Thread Greg Shackles
That sounds pretty cool Karl, and I also dig your use of Motorhead as an example : ) I recently built an application where payloads were a lifesaver, but my usage of them is pretty basic. I am indexing pages of text, so I use payloads to store metadata about each word on the page - size, color, r

Re: Payloads and SpanScorer

2008-07-10 Thread Grant Ingersoll
I'm not fully following what you want. Can you explain a bit more? Thanks, Grant On Jul 9, 2008, at 2:55 PM, Peter Keegan wrote: If a SpanQuery is constructed from one or more BoostingTermQuery(s), the payloads on the terms are never processed by the SpanScorer. It seems to me that you wou

Re: Payloads and SpanScorer

2008-07-10 Thread Peter Keegan
Suppose I create a SpanNearQuery phrase with the terms "long range missiles" and some slop factor. Each term is actually a BoostingTermQuery. Currently, the score computed by SpanNearQuery.SpanScorer is based on the sloppy frequency of the terms and their weights (this is fine). But even though eac

Re: Payloads and SpanScorer

2008-07-10 Thread Grant Ingersoll
Makes sense. It was always my intent to implement things like PayloadNearQuery, see http://wiki.apache.org/lucene-java/Payload_Planning I think it would make sense to develop these and I would be happy to help shepherd a patch through, but am not in a position to generate said patch at thi

Re: Payloads and SpanScorer

2008-07-10 Thread Peter Keegan
I may take a crack at this. Any more thoughts you may have on the implementation are welcome, but I don't want to distract you too much. Thanks, Peter On Thu, Jul 10, 2008 at 1:30 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > Makes sense. It was always my intent to implement things like > P

Re: Payloads and SpanScorer

2008-07-19 Thread Peter Keegan
I discovered this post from Karl Wettin in May about SpanNearQuery scoring: http://www.nabble.com/SpanNearQuery-scoring-td17425454.html#a17425454 Karl apparently had the same expectations I had about the usage model of spans and boosts. I also found JIRA issue 533 (SpanQuery scoring: SpanWeight la

Re: Payloads and tokenizers

2008-08-14 Thread Doron Cohen
IIRC first versions of patches that added payloads support had this notion of payload by field rather than by token, but later it was modified to be by token only. I have seen two code patterns to add payloads to tokens. The first one created the field text with a reserved separator/delimiter whi

Re: Payloads and tokenizers

2008-08-14 Thread Antony Bowesman
Thanks for your comments Doron. I found the earlier discussions on the dev list (21/12/06), where this issue is discussed - my use case is similar to Nadav Har'El. Implementing payloads via Tokens explicitly prevents the use of payloads for untokenized fields, as they only support field.string

Re: Payloads and tokenizers

2008-08-17 Thread Doron Cohen
> > Implementing payloads via Tokens explicitly prevents the use of payloads > for untokenized fields, as they only support field.stringValue(). There > seems no way to override this. I assume you already know this but just to make sure what I meant was clear - on tokenization but still indexing

Re: Payloads and PhraseQuery

2007-06-27 Thread Mark Miller
You cannot do it because TermPositions is read in the PhraseWeight.scorer(IndexReader) method (or MultiPhraseWeight) and loaded into an array which is passed to PhraseScorer. Extend the Weight as well and pass the payload to the Scorer as well is a possibility. - Mark Peter Keegan wrote: I'm

Re: Payloads and PhraseQuery

2007-06-27 Thread Grant Ingersoll
Could you get what you need combining the BoostingTermQuery with a SpanNearQuery to produce a score? Just guessing here.. At some point, I would like to see more Query classes around the payload stuff, so please submit patches/feedback if and when you get a solution On Jun 27, 2007, at 1

Re: Payloads and PhraseQuery

2007-06-29 Thread Peter Keegan
I tried to subclass PhraseScorer, but discovered that it's an abstract class and its subclasses (ExactPhraseScorer and SloppyPhraseScorer) are final classes. So instead, I extended Scorer with my custom scorer and extended PhraseWeight (after making it public). My scorer's constructor is passed th

Re: Payloads and PhraseQuery

2007-07-11 Thread Peter Keegan
I'm now looking at using payloads with SpanNearQuery but I don't see any clear way of getting the payload(s) from the matching span terms. The term positions for the payloads seem to be buried beneath SpanCells in the NearSpansOrdered and NearSpansUnordered classes, which are not public. I'd be co

Re: Payloads and PhraseQuery

2007-07-11 Thread Chris Hostetter
: I'm now looking at using payloads with SpanNearQuery but I don't see any : clear way of getting the payload(s) from the matching span terms. The term : positions for the payloads seem to be buried beneath SpanCells in the Isn't Spans.start() and Spans.end() what you are looking for? -Hoss

Re: Payloads and PhraseQuery

2007-07-12 Thread Peter Keegan
I'm looking for Spans.getPositions(), as shown in BoostingTermQuery, but neither NearSpansOrdered nor NearSpansUnordered (which are the Spans provided by SpanNearQuery) provide this method and it's not clear to me how to add it. Peter On 7/11/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: : I

Re: Payloads and PhraseQuery

2007-07-12 Thread Grant Ingersoll
That is off of the TermSpans class. BTQ (BoostingTermQuery) is implemented to extend SpanQuery, thus SpanNearQuery isn't, w/o modification, going to have access to these things. However, if you look at the SpanTermQuery, you will see that it's implementation of Spans is indeed the TermSpa

Re: Payloads and PhraseQuery

2007-07-12 Thread Paul Elschot
On Thursday 12 July 2007 14:50, Grant Ingersoll wrote: > That is off of the TermSpans class. BTQ (BoostingTermQuery) is > implemented to extend SpanQuery, thus SpanNearQuery isn't, w/o > modification, going to have access to these things. However, if you > look at the SpanTermQuery, you wi

Re: Payloads and PhraseQuery

2007-07-12 Thread Grant Ingersoll
Yep, totally agree.One way to handle this initially at least is have isPayloadAvailable() only return true for the SpanTermQuery. The other option is to come up with some modification of the suggested methods below to return all the payloads in a span. I have a basic implementation for

Re: Payloads and PhraseQuery

2007-07-12 Thread Peter Keegan
Grant, If/when you have an implementation for SpanNearQuery, I'd be happy to test it. Peter On 7/12/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: Yep, totally agree.One way to handle this initially at least is have isPayloadAvailable() only return true for the SpanTermQuery. The other op

Re: Payloads and PhraseQuery

2007-07-12 Thread Chris Hostetter
: That is off of the TermSpans class. BTQ (BoostingTermQuery) is ... : I am not completely sure here, but it seems like we may need an : efficient way to access the TermPositions for each document. That : is, the Spans class doesn't provide this and maybe it should ... : > I'm lo

Re: Payloads and PhraseQuery

2007-07-12 Thread Grant Ingersoll
On Jul 12, 2007, at 6:12 PM, Chris Hostetter wrote: Hmm... okay so the issue is that in order to get the payload data, you have to have a TermPositions instance. instead of adding getPayload methods to the Spans class (which as Paul points out, can have nesting issues) perhaps more general s

Re: Payloads and PhraseQuery

2007-07-27 Thread Peter Keegan
I have a question about the way fields are analyzed and inverted by the index writer. Currently, if a field has multiple occurrences in a document, each occurrence is analyzed separately (see DocumentsWriter.processField). Is it safe to assume that this behavior won't change in the future? The reas

Re: Payloads and PhraseQuery

2007-07-27 Thread Peter Keegan
I guess this also ties in with 'getPositionIncrementGap', which is relevant to fields with multiple occurrences. Peter On 7/27/07, Peter Keegan <[EMAIL PROTECTED]> wrote: > > I have a question about the way fields are analyzed and inverted by the > index writer. Currently, if a field has multiple

Re: Payloads API and support

2011-02-02 Thread Grant Ingersoll
On Feb 1, 2011, at 2:59 AM, Ophir Cohen wrote: > Hi Guys, > > I've been using Lucene for more than 5 years and it is a great tool - great > job! Thanks for everything... Thanks. Just so you know going forward, please be patient in expecting answers, especially for complex questions like this

Re: Payloads API and support

2011-02-02 Thread Ophir Cohen
Hi Grant, Thanks for the answer - it wasn't a question of patient just accidentally sent the same message more than once... Sorry for that. Anyway, I'm checking right now the option to hold the metrics in in-memory array (for all docs) and retrieve the metrics for that array rather than from Lucen

Re: Payloads disabled in 4.5?

2013-10-15 Thread Michael McCandless
Something catastrophic went wrong with the whitespace in your email ... But I think the problem is in your incrementToken impl: you cannot set the payload, and then call input.incrementToken, because input.incrementToken will clear all attributes (including payload). Try reversing the order of tho

RE: Payloads disabled in 4.5?

2013-10-15 Thread Kyle Judson
That was it. Reversing the order fixed it. Thanks for wading through the lack of whitespace. ThanksKyle > From: luc...@mikemccandless.com > Date: Tue, 15 Oct 2013 09:24:17 -0400 > Subject: Re: Payloads disabled in 4.5? > To: java-user@lucene.apache.org > > Something catastroph

Re: Payloads, Tokenizers, and Filters. Oh My!

2007-11-17 Thread Grant Ingersoll
Inline below On Nov 16, 2007, at 6:03 PM, Tricia Williams wrote: Hi All, I'll explain what I'm working on, and then I'll ask my two questions. I'm working on the issue https://issues.apache.org/jira/browse/SOLR-380 which is a feature request that allows one to index a "Structured D

Re: Payloads, Tokenizers, and Filters. Oh My!

2007-11-17 Thread Tricia Williams
Hi Grant, Thanks for your response! Taking a closer look at the TokenFilter(s) that causes my problem with the Payload are all from org.apache.solr.analysis rather than org.apache.lucene.analysis. I had originally thought that all the TokenFilters available through Solr's TokenFilterFa

Re: Payloads, Tokenizers, and Filters. Oh My!

2007-11-18 Thread Tricia Williams
I apologize for cross-posting but I believe both Solr and Lucene users and developers should be concerned with this. I am not aware of a better way to reach both communities. In this email I'm looking for comments on: * Do TokenFilters belong in the Solr code base at all? * How to deal

Re: Payloads, Tokenizers, and Filters. Oh My!

2007-11-20 Thread Chris Hostetter
: I apologize for cross-posting but I believe both Solr and Lucene users and : developers should be concerned with this. I am not aware of a better way to : reach both communities. some of these questions strike me as being largely unrelated. if anyone wishes to followup on them further, let'

Fields with the same name?? - Was Re: Payloads and tokenizers

2008-08-17 Thread Antony Bowesman
I assume you already know this but just to make sure what I meant was clear - on tokenization but still indexing just means that the entire field's text becomes a single unchanged token. I believe this is exactly what SingleTokenTokenStream can buy you - a single token, for which you can pre set a

Re: Fields with the same name?? - Was Re: Payloads and tokenizers

2008-08-18 Thread Doron Cohen
> > payload and the other part for storing, i.e. something like this: >> >>Token token = new Token(...); >>token.setPayload(...); >>SingleTokenTokenStream ts = new SingleTokenTokenStream(token); >> >>Field f1 = new Field("f","some-stored-content",Store.YES,Index.NO); >>Field f2

Re: Fields with the same name?? - Was Re: Payloads and tokenizers

2008-08-18 Thread Antony Bowesman
Doron Cohen wrote: The API definitely doesn't promise this. AFAIK implementation wise it happens to be like this but I can be wrong and plus it might change in the future. It would make me nervous to rely on this. I made some tests and it 'seems' to work, but I agree, it also makes me nervous

Re: Fields with the same name?? - Was Re: Payloads and tokenizers

2008-08-20 Thread Doron Cohen
On Tue, Aug 19, 2008 at 2:15 AM, Antony Bowesman <[EMAIL PROTECTED]> wrote: > > Thanks for you time and I appreciate your valuable insight Doron. > Antony > I'm glad I could help! Doron