This was at least one of the threads that was bouncing around... I'm
fairly sure there were others as well.
Hopefully its worth the read to you ^^
http://www.opensubscriber.com/message/java-...@lucene.apache.org/11079539.html
Phil Whelan wrote:
On Wed, Jul 22, 2009 at 12:28 PM, Matthew
If I understand lucene correctly, when doing multiple simultaneous
searches on the same IndexSearcher, they will basically all do their
own index scans and collect results independently. If that's correct,
is there a way to batch searches together, so only one index scan is
done? What I'd like
It's not accurate to say that Lucene scans the index for each search.
Rather, every Query reads a set of posting lists, each are typically read
from disk. If you pass Query[] which have nothing to do in common (for
example no terms in common), then you won't gain anything, b/c each Query
will
If you did this, wouldn't you be binding the processing of the results
of all queries to that of the slowest performing one within the collection?
I'm guessing you are trying for some sort of performance benefit by
batch processing, but I question whether or not you will actually get
more
It's not accurate to say that Lucene scans the index for each search.
Rather, every Query reads a set of posting lists, each are typically read
from disk. If you pass Query[] which have nothing to do in common (for
example no terms in common), then you won't gain anything, b/c each Query
will
Queries cannot be ordered sequentially. Let's say that you run 3 Queries,
w/ one term each a, b and c. On disk, the posting lists of the terms
can look like this: post1(a), post1(c), post2(a), post1(b), post2(c),
post2(b) etc. They are not guaranteed to be consecutive. The code makes sure
the
If you did this, wouldn't you be binding the processing of the results
of all queries to that of the slowest performing one within the collection?
I would imagine it would, but I haven't seen too much variance between
lucene query speeds in our data.
I'm guessing you are trying for some sort
Out of curiosity, what is the size of your corpus? How much and how
quickly do you expect it to grow?
I'm just trying to make sure that we are all on the same page here ^^
I can see the benefits of doing what you are describing with a very
large corpus that is expected to grow at quick rate,
Out of curiosity, what is the size of your corpus? How much and how
quickly do you expect it to grow?
in terms of lucene documents, we tend to have in the 10M-100M range.
Currently we use merging to make larger indices from smaller ones, so
a single index can have a lot of documents in it, but
Not sure if this helps you, but some of the issue you are facing seem
similar to those in the real time search threads.
Basically their problem involves indexing twitter and the blogosphere,
and making lucene work for super large data sets like that.
Perhaps some of the discussion in those
On Wed, Jul 22, 2009 at 12:28 PM, Matthew Hallmh...@informatics.jax.org wrote:
Not sure if this helps you, but some of the issue you are facing seem
similar to those in the real time search threads.
Hi Matthew,
Do you have a pointer of where to go to see the real time threads?
Thanks,
Phil
11 matches
Mail list logo