Re: Searching by bit masks

2006-11-28 Thread Biggy
OK here what i've come up with - After reading your suggestions - bit set from DB stays untouched - only one field shall be used to store interest field bits in the document: interest. Saves disk space. - The bits shall be not be converted to readable string but added as values separated by

Re: Querying performance decrease in 1.9.1 and 2.0.0

2006-11-28 Thread Stanislav Jordanov
Paul, we are using a slightly modified version of Lucene, so in order to run the performance tests on a nightly build, I need Lucene's sources, not the compiled classes. Is there a nice and easy way to get them? Stanislav Stanislav Jordanov wrote: Paul, We are working on delivering the next

Re: Searching by bit masks

2006-11-28 Thread Erick Erickson
Lucene will automatically separate tokens during index and search if you use the right analyzer. See the various classes that implement Analyzer. I don't know if you really wanted to use the numeric literals, but I wouldn't. The analyzers that do the most for you (automatically break up on

Re: Searching by bit masks

2006-11-28 Thread Biggy
The background of this is also separating content according to domains Example: - pictureA (marked as a joke #flag :1) - pictureB (marked as a adult picture #flag: 2) Site1: Users allowed to view everything (pictureA, pictureB ) Site2: Users allowed to view everything except pictureB (no adult

Re: Searching by bit masks

2006-11-28 Thread Erick Erickson
You could store a value for each flag then be careful about what analyzers you use. For instance, using WhitespaceAnalyzer (index AND search) and doing your own casing. That is, make sure you lowercase as necessary (NOTE: operators AND, OR NOT must not be lowercased if you send them through

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-28 Thread Yonik Seeley
On 11/27/06, Michael McCandless [EMAIL PROTECTED] wrote: Suman Ghosh wrote: On 11/27/06, Yonik Seeley [EMAIL PROTECTED] wrote: On 11/27/06, Suman Ghosh [EMAIL PROTECTED] wrote: Here are the values: mergeFactor=10 maxMergeDocs=10 minMergeDocs=100 And I see your point. At the

RE: Hits length with no sorting or scoring

2006-11-28 Thread Hirsch Laurence
The code works very well, Thanks, Laurie -Original Message- From: Paul Elschot [mailto:[EMAIL PROTECTED] Sent: 27 November 2006 18:52 To: java-user@lucene.apache.org Subject: Re: Hits length with no sorting or scoring On Monday 27 November 2006 14:30, Hirsch Laurence wrote: Hello,

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-28 Thread Michael McCandless
Yonik Seeley wrote: Actually, in previous versions of Lucene, it *was* possible to get way too many first level segments because of the wonky logic when the IndexWriter was closed. That has been fixed in the trunk with the new merge policy, and you will never see more than mergeFactor first

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-28 Thread Michael McCandless
This looks correct to me. It's good you are doing the deletes in bulk up front for each batch of documents. So I guess you hit the error ( 5000 segments files) while processing batches of 200 docs (because you then optimize in the end)? Do you search this index while it's building, or, only

Re: Syns2Index utility: version of Lucene and Java

2006-11-28 Thread Chris Hostetter
1) I don't really know anything about Syns2Index - but the errors you cited don't seem to have anything to do with Lucene ... your compiler appears to be complaining about assert statements within the core java system classes ... which is a little strainge. you said you are psat the HellowWorld

multiple keyword fields vs. multiple-token field

2006-11-28 Thread Michael Rusch
I have documents that can be referred to by multiple identifiers (and I want to store the identifiers separate from the main indexed content). I'm wondering if I should put each identifier in it's own keyword field, or have one tokenized field with all of the identifiers in it. What I'm talking

Re: multiple keyword fields vs. multiple-token field

2006-11-28 Thread Erik Hatcher
On Nov 28, 2006, at 4:31 PM, Michael Rusch wrote: I have documents that can be referred to by multiple identifiers (and I want to store the identifiers separate from the main indexed content). I'm wondering if I should put each identifier in it's own keyword field, or have one tokenized

BUG ? - lucene multisearcher / sorting

2006-11-28 Thread Kai R. Emde
Hello, we have one problem with the sort routine. We use the multisearcher function over severall index. The result will be sorted by the booknumber, but the produced list isn't sorted correct. There are 300 hits from book a, then 150 from book b, 95 hits book 3, but then there are 1,2,3 hits of

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-28 Thread Michael McCandless
Suman Ghosh wrote: The search functionality must be available during the index build. Since a relatively small number of documents are being affected (and also we plan to perform the build during a period of time we know to be relatively quiet from last 2 years site access data) during the