OK here what i've come up with - After reading your suggestions
- bit set from DB stays untouched
- only one field shall be used to store interest field bits in the document:
interest. Saves disk space.
- The bits shall be not be converted to readable string but added as values
separated by
Paul,
we are using a slightly modified version of Lucene,
so in order to run the performance tests on a nightly build, I need
Lucene's sources, not the compiled classes.
Is there a nice and easy way to get them?
Stanislav
Stanislav Jordanov wrote:
Paul,
We are working on delivering the next
Lucene will automatically separate tokens during index and search if you use
the right analyzer. See the various classes that implement Analyzer. I don't
know if you really wanted to use the numeric literals, but I wouldn't. The
analyzers that do the most for you (automatically break up on
The background of this is also separating content according to domains
Example:
- pictureA (marked as a joke #flag :1)
- pictureB (marked as a adult picture #flag: 2)
Site1: Users allowed to view everything (pictureA, pictureB )
Site2: Users allowed to view everything except pictureB (no adult
You could store a value for each flag then be careful about what analyzers
you use. For instance, using WhitespaceAnalyzer (index AND search) and doing
your own casing. That is, make sure you lowercase as necessary (NOTE:
operators AND, OR NOT must not be lowercased if you send them through
On 11/27/06, Michael McCandless [EMAIL PROTECTED] wrote:
Suman Ghosh wrote:
On 11/27/06, Yonik Seeley [EMAIL PROTECTED] wrote:
On 11/27/06, Suman Ghosh [EMAIL PROTECTED] wrote:
Here are the values:
mergeFactor=10
maxMergeDocs=10
minMergeDocs=100
And I see your point. At the
The code works very well,
Thanks,
Laurie
-Original Message-
From: Paul Elschot [mailto:[EMAIL PROTECTED]
Sent: 27 November 2006 18:52
To: java-user@lucene.apache.org
Subject: Re: Hits length with no sorting or scoring
On Monday 27 November 2006 14:30, Hirsch Laurence wrote:
Hello,
Yonik Seeley wrote:
Actually, in previous versions of Lucene, it *was* possible to get way
too many first level segments because of the wonky logic when the
IndexWriter was closed. That has been fixed in the trunk with the new
merge policy, and you will never see more than mergeFactor first
This looks correct to me. It's good you are doing the deletes
in bulk up front for each batch of documents. So I guess you
hit the error ( 5000 segments files) while processing batches
of 200 docs (because you then optimize in the end)?
Do you search this index while it's building, or, only
1) I don't really know anything about Syns2Index - but the errors you
cited don't seem to have anything to do with Lucene ... your compiler
appears to be complaining about assert statements within the core java
system classes ... which is a little strainge. you said you are psat the
HellowWorld
I have documents that can be referred to by multiple identifiers (and I want
to store the identifiers separate from the main indexed content). I'm
wondering if I should put each identifier in it's own keyword field, or have
one tokenized field with all of the identifiers in it. What I'm talking
On Nov 28, 2006, at 4:31 PM, Michael Rusch wrote:
I have documents that can be referred to by multiple identifiers
(and I want
to store the identifiers separate from the main indexed content). I'm
wondering if I should put each identifier in it's own keyword
field, or have
one tokenized
Hello,
we have one problem with the sort routine. We use the multisearcher function
over severall index.
The result will be sorted by the booknumber, but the produced list isn't
sorted correct. There are 300 hits from book a, then 150 from book b, 95
hits book 3, but then there are 1,2,3 hits of
Suman Ghosh wrote:
The search functionality must be available during the index build. Since a
relatively small number of documents are being affected (and also we
plan to
perform the build during a period of time we know to be relatively quiet
from last 2 years site access data) during the
14 matches
Mail list logo