Syns2Index utility: version of Lucene and Java

2006-11-27 Thread Risov, Maria
I am trying to use Syns2Index utility to convert the WordNet into a Lucene index. First I downloaded the latest JDK and Lucene 2.0, but soon realized that both were too new for compiling Syns2Index.java. Next, got down to j2sdk1.4.2_13 and Lucene 1.4.3. by deciphering error messages. (I am

Re: How to set query time scoring

2006-11-27 Thread Sajid Khan
Thanks for the instant reply. More specifically i am trying to do is: 1) to show the results which contain the exact query phrase on top followed by ANDed results followed by the ORed results. 2) introduce new parameter that uses the query phrase to influence the ranking. regards Sajid

Re: Question about the not in lucene

2006-11-27 Thread hawat23
Thanks you for your answer. But, is it possible to group clauses with a not. exemple: type:product NOT (name:toto OR name:titi) ?? Christophe Mark Miller a écrit : Personally, I think of it as not a 'not' operator, but more a 'but not' or 'and not' operator. Thats not totally the case I

Hits length with no sorting or scoring

2006-11-27 Thread Hirsch Laurence
Hello, I have an application in which we only need to know the total number of documents matching a query. In this case we do not need any sorting or scoring or to store any reference to the matching documents. Can you tell me how to execute such a query with maximum performance? Thanks

Re: Database searching using Lucene....

2006-11-27 Thread Erick Erickson
This has been discussed extensively on this thread, so I think you'd get the fastest answers by searching the mail archive for database, db, etc. The short answer is it all depends upon what you want to accomplish and the characteristics of your problem. Erick On 11/27/06, Inderjeet Kalra

Re: Question about the not in lucene

2006-11-27 Thread Mark Miller
Yes, I believe that it is entirely possible. You can nest and link boolean clauses all you want: your example query would be a boolean with two top level clauses, one required to be there and one required not to be there. The second top level clause would itself be a boolean query with two two

Re: Searching by bit masks

2006-11-27 Thread Biggy
i have the same problem here. I have an interest bit field, which i receive from the applciation backend. I have control over how the docuemtns are built. To be specific, the field looks like this: ID: interest 1 : sport 2 : music 4 : film 8 : clubs So someone interested in sports and music

Re: RAMDirectory vs MemoryIndex

2006-11-27 Thread Wolfgang Hoschek
On Nov 26, 2006, at 8:57 AM, jm wrote: I tested this. I use a single static analyzer for all my documents, and the caching analyzer was not working properly. I had to add a method to clear the cache each time a new document was to be indexed, and then it worked as expected. I have never looked

Re: RAMDirectory vs MemoryIndex

2006-11-27 Thread jm
On 11/27/06, Wolfgang Hoschek [EMAIL PROTECTED] wrote: On Nov 26, 2006, at 8:57 AM, jm wrote: I tested this. I use a single static analyzer for all my documents, and the caching analyzer was not working properly. I had to add a method to clear the cache each time a new document was to be

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-27 Thread Yonik Seeley
On 11/27/06, Suman Ghosh [EMAIL PROTECTED] wrote: The last line [at org.apache.lucene.index.MultiTermDocs.next(MultiReader.java:349)] repeats another 1010 times before the program crashes. I understand that without the actual index or the documents, it's nearly impossible to narrow down the

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-27 Thread Suman Ghosh
Here are the values: mergeFactor=10 maxMergeDocs=10 minMergeDocs=100 And I see your point. At the time of the crash, I have over 5000 segments. I'll try some conservative number and try to rebuild the index. On 11/27/06, Yonik Seeley [EMAIL PROTECTED] wrote: On 11/27/06, Suman Ghosh

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-27 Thread Yonik Seeley
On 11/27/06, Suman Ghosh [EMAIL PROTECTED] wrote: Here are the values: mergeFactor=10 maxMergeDocs=10 minMergeDocs=100 And I see your point. At the time of the crash, I have over 5000 segments. I'll try some conservative number and try to rebuild the index. Although I don't see how those

Re: Hits length with no sorting or scoring

2006-11-27 Thread Paul Elschot
On Monday 27 November 2006 14:30, Hirsch Laurence wrote: Hello, I have an application in which we only need to know the total number of documents matching a query. In this case we do not need any sorting or scoring or to store any reference to the matching documents. Can you tell me how

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-27 Thread Suman Ghosh
Yonik, Thanks for the pointer. I'll try the nightly build once the change is committed. Suman On 11/27/06, Yonik Seeley [EMAIL PROTECTED] wrote: On 11/27/06, Suman Ghosh [EMAIL PROTECTED] wrote: Here are the values: mergeFactor=10 maxMergeDocs=10 minMergeDocs=100 And I see your

Re: RAMDirectory vs MemoryIndex

2006-11-27 Thread Wolfgang Hoschek
On Nov 27, 2006, at 9:57 AM, jm wrote: On 11/27/06, Wolfgang Hoschek [EMAIL PROTECTED] wrote: On Nov 26, 2006, at 8:57 AM, jm wrote: I tested this. I use a single static analyzer for all my documents, and the caching analyzer was not working properly. I had to add a method to clear the

Re: Searching by bit masks

2006-11-27 Thread Erick Erickson
Well, you really have the code already G. From the top... 1 there's no good way to support searching bitfields If you wanted, you could probably store it as a small integer and then search on it, but that's waaay too complicated than you want. 2 Add the fields like you have the snippet from,

Re: RAMDirectory vs MemoryIndex

2006-11-27 Thread jm
yes that would be ok for my, as long as I can reuse my child analyzer. On 11/27/06, Wolfgang Hoschek [EMAIL PROTECTED] wrote: On Nov 27, 2006, at 9:57 AM, jm wrote: On 11/27/06, Wolfgang Hoschek [EMAIL PROTECTED] wrote: On Nov 26, 2006, at 8:57 AM, jm wrote: I tested this. I use a

Re: RAMDirectory vs MemoryIndex

2006-11-27 Thread Wolfgang Hoschek
Ok. I reverted back to the version without a public clear() method. Wolfgang. On Nov 27, 2006, at 12:17 PM, jm wrote: yes that would be ok for my, as long as I can reuse my child analyzer. On 11/27/06, Wolfgang Hoschek [EMAIL PROTECTED] wrote: On Nov 27, 2006, at 9:57 AM, jm wrote: On

Re: Querying performance decrease in 1.9.1 and 2.0.0

2006-11-27 Thread Paul Elschot
Stanislav, On Wednesday 22 November 2006 09:52, Stanislav Jordanov wrote: Paul, We are working on delivering the next release by the end of the week so I have to take care of 2 or 3 issues before I try the nightly build. I promise to try it and report the results here. I have made a first

Re: Searching by bit masks

2006-11-27 Thread Daniel Noll
Erick Erickson wrote: Well, you really have the code already G. From the top... 1 there's no good way to support searching bitfields If you wanted, you could probably store it as a small integer and then search on it, but that's waaay too complicated than you want. 2 Add the fields like you

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-27 Thread Suman Ghosh
Mike, I've not tried it yet, but I think the problem can be reproduced. However, it'll take a few hours to reach that threshhold since my code also needs to extract text from some very large PDF documents to store in the index. I'll post the pseudo-code of my code tomorrow. Maybe that'll help