Yeah, I wasn't too excited over it and I certainly didn't lose any
sleep over it, but there are some interesting things of note in there
concerning Lucene, including the claim that it fell over on indexing
WT10g docs (page 40) and I am always looking for ways to improve
things. Overall, I
http://www.clef-campaign.org/2006/working_notes/workingnotes2006/
dinunzioOCL
EF2006.pdf, ...) for other information search the web ;-)
Samir
-Message d'origine-
De : Mark Miller [mailto:[EMAIL PROTECTED]
Envoyé : vendredi 7 décembre 2007 21:01
À : java-dev@lucene.apache.org
Obj
ther information search the web ;-)
Samir
-Message d'origine-
De : Mark Miller [mailto:[EMAIL PROTECTED]
Envoyé : vendredi 7 décembre 2007 21:01
À : java-dev@lucene.apache.org
Objet : Re: O/S Search Comparisons
Yes, and even if they did not use the stock defaults, I would bet ther
lineProceedings6/NTCIR/NTCIR6-OVE
RVIEW.pdf for NTCIR-6, for CLEF have a look at
http://www.clef-campaign.org/2006/working_notes/workingnotes2006/dinunzioOCL
EF2006.pdf, ...) for other information search the web ;-)
Samir
-Message d'origine-
De : Mark Miller [mailto:[EMAIL PROTECT
06.pdf, ...) for other information search the web ;-)
Samir
> -Message d'origine-
> De : Mark Miller [mailto:[EMAIL PROTECTED]
> Envoyé : vendredi 7 décembre 2007 21:01
> À : java-dev@lucene.apache.org
> Objet : Re: O/S Search Comparisons
>
> Yes, and even if they
Yes, and even if they did not use the stock defaults, I would bet there
would be complaints about what was done wrong at every turn. This seems
like a very difficult thing to do. How long does it take to fully learn
how to correctly utilize each search engine for the task at hand? I am
sure lon
There is a good chance that they were using stock indexing defaults,
based on:
Lucene:
" In the present work, the simple applications
bundled with the library were used to index the collection. "
On 7-Dec-07, at 10:27 AM, Grant Ingersoll wrote:
Yeah, I wasn't too excited over it and I certain
I wouldn't get too excited over this. Once again, it does not seem
the evaluator understands the nature of GC based systems, and the
memory statistics are quite out of whack. But it is hard to tell
because there is no data on how memory consumption was actually
measured.
A far better way
On Dec 8, 2007, at 4:51 AM, Michael McCandless wrote:
Sometimes, when something like this comes up, it gives you the
opportunity to take a step back and ask what are the things we
really want Lucene to be going forward (the New Year is good for
this kind of assessment as well) What are it
Grant Ingersoll <[EMAIL PROTECTED]> wrote on 08/12/2007 16:02:31:
>
> On Dec 8, 2007, at 4:51 AM, Michael McCandless wrote:
>
> >>> Sometimes, when something like this comes up, it gives you the
> >>> opportunity to take a step back and ask what are the things we
> >>> really want Lucene to be goi
This is along the lines of what I have tried to get the Lucene
community to adopt for a long time.
If you want to take Lucene to the next level, it needs a "server"
implementation.
Only with this can you get efficient locks, caching, transactions,
which leads to more efficient indexing an
Well, at some point the answer is "use Solr". I think Lucene should
stay focused on being a good search library/component, and server
level capabilities should be handled by Solr or the application layer
on top of Lucene.
That said, I still think there is a need for a layer that handles/
On Dec 7, 2007, at 3:01 PM, Mark Miller wrote:
Yes, and even if they did not use the stock defaults, I would bet
there would be complaints about what was done wrong at every turn.
This seems like a very difficult thing to do. How long does it take
to fully learn how to correctly utilize ea
On 8-Dec-07, at 10:04 PM, Doron Cohen wrote:
+1 I have been thinking about this too. Solr clearly demonstrates
the benefits of this kind of approach, although even it doesn't make
it seamless for users in the sense that they still need to divvy up
the docs on the app side.
Would be nice if t
I did hear back from the authors. Some of the issues were based on
values chosen for mergeFactor (10,000) I think, but there also seemed
to be some questions about parsing the TREC collection. It was split
out into individual files, as opposed to trying to stream in the
documents like we
For the data that I normally work with (short articles), I found that
the sweet spot was around 80-120. I actually saw a slight decrease going
above that...not sure if that held forever though. That was testing on
an earlier release (I think 2.1?). However, if you want to test
searching it wou
On Dec 18, 2007 2:38 AM, Mark Miller <[EMAIL PROTECTED]> wrote:
> For the data that I normally work with (short articles), I found that
> the sweet spot was around 80-120. I actually saw a slight decrease going
> above that...not sure if that held forever though. That was testing on
> an earlier r
My testing experience has shown around 100 to be good for things like
Wikipedia, etc. That is an interesting point to think about in
regards to paying the cost once optimize is undertaken and may be
worth exploring more. I also wonder how partial optimizes may help.
The Javadocs say:
Det
18 matches
Mail list logo