Re: How to pass additional information into Similarity.scorePayload(...)

2008-02-14 Thread Paul Elschot
Op Friday 15 February 2008 02:47:14 schreef Cedric Ho: > Sorry that I didn't make myself clear. > > [10/5/2] means for terms found in the 1st paragraph, give it score*10, > for terms in the 2nd, give it score*5, etc. > > So I don't know how to do this scoring if the position (paragraph) > informa

Re: Design questions

2008-02-14 Thread Chris Hostetter
I haven't really been following this thread that closely, but... : Why not just use ? Check to insure that it makes : it through whatever analyzer you choose though. For instance, : LetterTokenizer will remove it... 1) i'm 99% sure you can do something like this... Document doc = new

Re: regex expressions within phrase queries

2008-02-14 Thread Chris Hostetter
: I was wondering if anyone has a more efficient method for achieving this. : Would changing QueryParser.jj and developing a custom PhraseQuery class be a : good idea? Any comments would be appreciated. extending QueryParser and overriding the getPhraseQuery function to return your own SpanNear

Re: Question .. advanced query

2008-02-14 Thread Chris Hostetter
it's not possible with the query syntax out of the box, but you could write a custom subclass of SpanQuery to make it possible. if your numbers are simple enough (ie: just 0-9) you could probably make the SpanRegexQuery work for you. : I am trying to perform a query that will enable me to run

Some more questions on Payloads

2008-02-14 Thread Cedric Ho
Hi all, This is the same problem I am trying to solve as in the thread: "How to pass additional information into Similarity.scorePayload(...)" However since these questions are somewhat different, I figure I'd start a new thread. After diving into the Lucene Source codes for a while now I have

Re: How to pass additional information into Similarity.scorePayload(...)

2008-02-14 Thread Cedric Ho
Sorry that I didn't make myself clear. [10/5/2] means for terms found in the 1st paragraph, give it score*10, for terms in the 2nd, give it score*5, etc. So I don't know how to do this scoring if the position (paragraph) information is in a separate field. Cedric On Fri, Feb 15, 2008 at 7:15 A

regex expressions within phrase queries

2008-02-14 Thread Jim Bogan
I would like to be able to handle the following: "/\d\d\d{4} \\d\\d/ office" Where / indicates a regex expression phrase. One option is extending MultiFieldQueryParser and catching the phrase within getFieldQuery evaluating whether /, the regex identifier, is present and then returning a SpanNe

Re: Design questions

2008-02-14 Thread Erick Erickson
Why not just use ? Check to insure that it makes it through whatever analyzer you choose though. For instance, LetterTokenizer will remove it... Erick On Thu, Feb 14, 2008 at 4:41 PM, <[EMAIL PROTECTED]> wrote: > > Rather than index one doc per page, you could index a special > > token b

Re: How to pass additional information into Similarity.scorePayload(...)

2008-02-14 Thread Paul Elschot
I have no idea what the [10/5/2] means, so I can't comment on that. In case I have missed it previously I'm sorry. My point was that payloads need not be used for different position info. It's possible to do that, and it may be good for performance in some cases, but one can revert to using anothe

RE: Design questions

2008-02-14 Thread spring
> Rather than index one doc per page, you could index a special > token between pages. Say you index $ as the special > token. I have decided to use this version, but... What token can I use? It must be a token which gets never removed by an analyzer or altered in a way that it not uniqu

Re: how to get the programmatic control over index's document id

2008-02-14 Thread John Wang
There was a thread on this exact issue.If you are using 2.3, the payload api would help with that. -John On Thu, Feb 14, 2008 at 3:59 AM, Gauri Shankar <[EMAIL PROTECTED]> wrote: > Thanks a lot for both of you. > > yes, I am talking about internally assigned document id. > > Erick : I am already

Re: how to get the programmatic control over index's document id

2008-02-14 Thread Erick Erickson
I'm a little confused about what's happening here Are you inserting a pre-existing primary key from the DB into your Lucene documents? Or are you using the Lucene doc ID as your primary key in the database? If this latter, you're going to have many, many problems since the Lucene ID can, and will

Re: design: merging resultset from RDBMS with lucene search results

2008-02-14 Thread Erick Erickson
Another possibility is to do it backwards, it depends on how expensive the SQL query is I suppose. The idea would be to go ahead and to your SQL query *first*, then construct a Lucene Filter to use with your query using TermDocs/TermEnum. I'd guess (without knowing much about your problem space) t

Re: Lucene+Oracle Integration

2008-02-14 Thread Marcelo Ochoa
Hi Mitesh: Lucene-OJVM integration is not tested against lucene-2.3.0 version. I'll do it ASAP. Best regards, Marcelo. On Thu, Feb 14, 2008 at 10:01 AM, Mitesh Soni <[EMAIL PROTECTED]> wrote: > > > > > I have run the build file in the lucene-2.3.0\contrib\ojvm successfully. But > I cannot cr

Lucene+Oracle Integration

2008-02-14 Thread Mitesh Soni
I have run the build file in the lucene-2.3.0\contrib\ojvm successfully. But I cannot create index with the use of . create index it1 on t1(f2) indextype is lucene.LuceneIndex parameters('Analyzer:org.apache.lucene.analysis.SimpleAnalyzer'); ERROR at line 1: ORA-29855: error occurred in

Re: how to get the programmatic control over index's document id

2008-02-14 Thread Gauri Shankar
Thanks a lot for both of you. yes, I am talking about internally assigned document id. Erick : I am already using the unique id into the index mapped to one of our DB's primary key to uniquely identify the docs from index. Now to get the value of this unique field i need to call getDocumet(). Bu

Re: matching products with suggest feature

2008-02-14 Thread Shai Erera
If it adds the clauses as Occur.SHOULD, it means they should appear, but does not have to appear. Looking at suggestSimilar, it looks like it computes the edit_distance values of the requested word and the suggestions. If the score is lower than the minimum score, it may skip the word. Could you tr

Re: matching products with suggest feature

2008-02-14 Thread Cam Bazz
Hello Shai, Thats right, Speller is in the contrib.it is named spellchecker. Basically it is a special index that stores the words as ngrams. I looked at the code to see how it is querying the index and basically it makes ngrams and adds each ngram to a boolean query. Here is how it adds to the b

Re: How to pass additional information into Similarity.scorePayload(...)

2008-02-14 Thread Cedric Ho
Hi Paul, Sorry I am not sure I understand your solution. Because I would need to apply this scoring logic to all the different types of Queries. A search may consists of something like: +(term1 phrase2 wildcard*) +spanNear(term3 term4) [10/5/2] And this [10/5/2] ratio have to be applied to the