Re: O/S Search Comparisons

2007-12-18 Thread Grant Ingersoll
My testing experience has shown around 100 to be good for things like Wikipedia, etc. That is an interesting point to think about in regards to paying the cost once optimize is undertaken and may be worth exploring more. I also wonder how partial optimizes may help. The Javadocs say:

Re: O/S Search Comparisons

2007-12-17 Thread Grant Ingersoll
I did hear back from the authors. Some of the issues were based on values chosen for mergeFactor (10,000) I think, but there also seemed to be some questions about parsing the TREC collection. It was split out into individual files, as opposed to trying to stream in the documents like we

Re: O/S Search Comparisons

2007-12-17 Thread Mark Miller
For the data that I normally work with (short articles), I found that the sweet spot was around 80-120. I actually saw a slight decrease going above that...not sure if that held forever though. That was testing on an earlier release (I think 2.1?). However, if you want to test searching it

Re: O/S Search Comparisons

2007-12-17 Thread Doron Cohen
On Dec 18, 2007 2:38 AM, Mark Miller [EMAIL PROTECTED] wrote: For the data that I normally work with (short articles), I found that the sweet spot was around 80-120. I actually saw a slight decrease going above that...not sure if that held forever though. That was testing on an earlier

Re: O/S Search Comparisons

2007-12-10 Thread Grant Ingersoll
On Dec 7, 2007, at 3:01 PM, Mark Miller wrote: Yes, and even if they did not use the stock defaults, I would bet there would be complaints about what was done wrong at every turn. This seems like a very difficult thing to do. How long does it take to fully learn how to correctly utilize

Re: O/S Search Comparisons

2007-12-10 Thread Mike Klaas
On 8-Dec-07, at 10:04 PM, Doron Cohen wrote: +1 I have been thinking about this too. Solr clearly demonstrates the benefits of this kind of approach, although even it doesn't make it seamless for users in the sense that they still need to divvy up the docs on the app side. Would be nice if

Re: O/S Search Comparisons

2007-12-09 Thread Michael McCandless
Well, at some point the answer is use Solr. I think Lucene should stay focused on being a good search library/component, and server level capabilities should be handled by Solr or the application layer on top of Lucene. That said, I still think there is a need for a layer that handles/

Re: O/S Search Comparisons

2007-12-08 Thread Grant Ingersoll
On Dec 8, 2007, at 4:51 AM, Michael McCandless wrote: Sometimes, when something like this comes up, it gives you the opportunity to take a step back and ask what are the things we really want Lucene to be going forward (the New Year is good for this kind of assessment as well) What are

Re: O/S Search Comparisons

2007-12-08 Thread Doron Cohen
Grant Ingersoll [EMAIL PROTECTED] wrote on 08/12/2007 16:02:31: On Dec 8, 2007, at 4:51 AM, Michael McCandless wrote: Sometimes, when something like this comes up, it gives you the opportunity to take a step back and ask what are the things we really want Lucene to be going forward (the

Re: O/S Search Comparisons

2007-12-08 Thread robert engels
This is along the lines of what I have tried to get the Lucene community to adopt for a long time. If you want to take Lucene to the next level, it needs a server implementation. Only with this can you get efficient locks, caching, transactions, which leads to more efficient indexing and

Re: O/S Search Comparisons

2007-12-07 Thread Grant Ingersoll
Yeah, I wasn't too excited over it and I certainly didn't lose any sleep over it, but there are some interesting things of note in there concerning Lucene, including the claim that it fell over on indexing WT10g docs (page 40) and I am always looking for ways to improve things. Overall, I

Re: O/S Search Comparisons

2007-12-07 Thread Mark Miller
[mailto:[EMAIL PROTECTED] Envoyé : vendredi 7 décembre 2007 21:01 À : java-dev@lucene.apache.org Objet : Re: O/S Search Comparisons Yes, and even if they did not use the stock defaults, I would bet there would be complaints about what was done wrong at every turn. This seems like a very difficult thing

Re: O/S Search Comparisons

2007-12-07 Thread Grant Ingersoll
://www.clef-campaign.org/2006/working_notes/workingnotes2006/dinunzioOCL EF2006.pdf, ...) for other information search the web ;-) Samir -Message d'origine- De : Mark Miller [mailto:[EMAIL PROTECTED] Envoyé : vendredi 7 décembre 2007 21:01 À : java-dev@lucene.apache.org Objet : Re: O/S Search

RE: O/S Search Comparisons

2007-12-07 Thread Samir Abdou
-Message d'origine- De : Mark Miller [mailto:[EMAIL PROTECTED] Envoyé : vendredi 7 décembre 2007 21:01 À : java-dev@lucene.apache.org Objet : Re: O/S Search Comparisons Yes, and even if they did not use the stock defaults, I would bet there would be complaints about what was done

Re: O/S Search Comparisons

2007-12-07 Thread Mark Miller
Yes, and even if they did not use the stock defaults, I would bet there would be complaints about what was done wrong at every turn. This seems like a very difficult thing to do. How long does it take to fully learn how to correctly utilize each search engine for the task at hand? I am sure

Re: O/S Search Comparisons

2007-12-07 Thread Mike Klaas
There is a good chance that they were using stock indexing defaults, based on: Lucene: In the present work, the simple applications bundled with the library were used to index the collection. On 7-Dec-07, at 10:27 AM, Grant Ingersoll wrote: Yeah, I wasn't too excited over it and I