My testing experience has shown around 100 to be good for things like
Wikipedia, etc. That is an interesting point to think about in
regards to paying the cost once optimize is undertaken and may be
worth exploring more. I also wonder how partial optimizes may help.
The Javadocs say:
I did hear back from the authors. Some of the issues were based on
values chosen for mergeFactor (10,000) I think, but there also seemed
to be some questions about parsing the TREC collection. It was split
out into individual files, as opposed to trying to stream in the
documents like we
For the data that I normally work with (short articles), I found that
the sweet spot was around 80-120. I actually saw a slight decrease going
above that...not sure if that held forever though. That was testing on
an earlier release (I think 2.1?). However, if you want to test
searching it
On Dec 18, 2007 2:38 AM, Mark Miller [EMAIL PROTECTED] wrote:
For the data that I normally work with (short articles), I found that
the sweet spot was around 80-120. I actually saw a slight decrease going
above that...not sure if that held forever though. That was testing on
an earlier
On Dec 7, 2007, at 3:01 PM, Mark Miller wrote:
Yes, and even if they did not use the stock defaults, I would bet
there would be complaints about what was done wrong at every turn.
This seems like a very difficult thing to do. How long does it take
to fully learn how to correctly utilize
On 8-Dec-07, at 10:04 PM, Doron Cohen wrote:
+1 I have been thinking about this too. Solr clearly demonstrates
the benefits of this kind of approach, although even it doesn't make
it seamless for users in the sense that they still need to divvy up
the docs on the app side.
Would be nice if
Well, at some point the answer is use Solr. I think Lucene should
stay focused on being a good search library/component, and server
level capabilities should be handled by Solr or the application layer
on top of Lucene.
That said, I still think there is a need for a layer that handles/
On Dec 8, 2007, at 4:51 AM, Michael McCandless wrote:
Sometimes, when something like this comes up, it gives you the
opportunity to take a step back and ask what are the things we
really want Lucene to be going forward (the New Year is good for
this kind of assessment as well) What are
Grant Ingersoll [EMAIL PROTECTED] wrote on 08/12/2007 16:02:31:
On Dec 8, 2007, at 4:51 AM, Michael McCandless wrote:
Sometimes, when something like this comes up, it gives you the
opportunity to take a step back and ask what are the things we
really want Lucene to be going forward (the
This is along the lines of what I have tried to get the Lucene
community to adopt for a long time.
If you want to take Lucene to the next level, it needs a server
implementation.
Only with this can you get efficient locks, caching, transactions,
which leads to more efficient indexing and
Yeah, I wasn't too excited over it and I certainly didn't lose any
sleep over it, but there are some interesting things of note in there
concerning Lucene, including the claim that it fell over on indexing
WT10g docs (page 40) and I am always looking for ways to improve
things. Overall, I
[mailto:[EMAIL PROTECTED]
Envoyé : vendredi 7 décembre 2007 21:01
À : java-dev@lucene.apache.org
Objet : Re: O/S Search Comparisons
Yes, and even if they did not use the stock defaults, I would bet there
would be complaints about what was done wrong at every turn. This seems
like a very difficult thing
://www.clef-campaign.org/2006/working_notes/workingnotes2006/dinunzioOCL
EF2006.pdf, ...) for other information search the web ;-)
Samir
-Message d'origine-
De : Mark Miller [mailto:[EMAIL PROTECTED]
Envoyé : vendredi 7 décembre 2007 21:01
À : java-dev@lucene.apache.org
Objet : Re: O/S Search
-Message d'origine-
De : Mark Miller [mailto:[EMAIL PROTECTED]
Envoyé : vendredi 7 décembre 2007 21:01
À : java-dev@lucene.apache.org
Objet : Re: O/S Search Comparisons
Yes, and even if they did not use the stock defaults, I would bet there
would be complaints about what was done
Yes, and even if they did not use the stock defaults, I would bet there
would be complaints about what was done wrong at every turn. This seems
like a very difficult thing to do. How long does it take to fully learn
how to correctly utilize each search engine for the task at hand? I am
sure
There is a good chance that they were using stock indexing defaults,
based on:
Lucene:
In the present work, the simple applications
bundled with the library were used to index the collection.
On 7-Dec-07, at 10:27 AM, Grant Ingersoll wrote:
Yeah, I wasn't too excited over it and I
16 matches
Mail list logo