RE: Problems indexing large documents

2006-06-10 Thread Rob Staveley (Tom)
The answer was of course in the FAQ - http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-3558e5121806fb4fce80fc0 22d889484a9248b71 Breaking large documents into manageable chunks isn't ideal. I need to index e-mail and with attachments which are frequently large. Currently each message part corr

Re: Different scoring mechanism

2006-06-10 Thread Otis Gospodnetic
Chris, Somebody recently asked me about how Lucene processes queries. Other than working on required clauses in a BooleanQuery first, and skipping if there are no matching Docs for them, there are no other query optimization strategies/tricks, are there? Otis - Original Message From

Re: COMMIT_LOCK_TIMEOUT - IndexSearcher/IndexReader

2006-06-10 Thread Otis Gospodnetic
I have never run into this problem, but I'd be curious to know why your system takes more than 10 seconds to read the segments? Super-large index on a slow disk? As for new ctors, I suppose they wouldn't hurt, if there really is a need for them. But 10 seconds is a long time... Otis - O

Re: Different scoring mechanism

2006-06-10 Thread Yonik Seeley
On 6/10/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: Other than working on required clauses in a BooleanQuery first, and skipping if there are no matching Docs for them, there are no other query optimization strategies/tricks, are there? I think that's pretty much it, depending on what y

Re: Aggregating category hits

2006-06-10 Thread zzzzz shalev
hi peter, two quick questions 1. could you let me know what kind of response time you were getting with solr (as well as the size of data and result sizes) 2. i took a really really quick look at DocSetHitCollector and saw the dreaded if (bits==null) bits = new BitSe

Re: Aggregating category hits

2006-06-10 Thread Yonik Seeley
On 6/10/06, z shalev <[EMAIL PROTECTED]> wrote: 1. could you let me know what kind of response time you were getting with solr (as well as the size of data and result sizes) A can tell you a little bit about ours... on one CNET faceted browsing implementation using Solr, the number of fa

Re: Aggregating category hits

2006-06-10 Thread Yonik Seeley
On 6/9/06, Peter Keegan <[EMAIL PROTECTED]> wrote: However, my throughput testing shows that the Solr method is at least 50% faster than mine. I'm seeing a big win with the use of the HashDocSet for lower hit counts. On my 64-bit platform, a MAX_SIZE value of 10K-20K seems to provide optimal perf

Re: Different scoring mechanism

2006-06-10 Thread Chris Hostetter
: Higher level optimizations that do query transformations are left as : an exercise to the application :-) Word! For example, Yonik helped me speed up a fairly hairy query i had a while back by realizing that the way i was progromatically generating a query, one deeply nested clause was actuall

Re: Aggregating category hits

2006-06-10 Thread Chris Hostetter
: A can tell you a little bit about ours... on one CNET faceted browsing : implementation using Solr, the number of facets to check per request : average somewhere between 100 and 200 (the total number of unique : facets is much larger though). The median request time is 3ms (and I : don't think

Re[2]: Fwd: Lucene 2.0.0 release available

2006-06-10 Thread Sven Duzont
Hello Otis, I unfortunately don't master enough maven yet to know how to push a library on maven public repositories. However i just found this eBook that looks interesting http://www.mergere.com/m2book_download.jsp Waiting for a maven guru to release lucene 2.0 on a public repo, i'll just

Re: Aggregating category hits

2006-06-10 Thread zzzzz shalev
hi yonik, thanks for the thurough reply,, a few more quick questions... "the number of facets to check per request average somewhere between 100 and 200 (the total number of unique facets is much larger though). " you mean 100 - 200 different catagories to facet? i ran

Lucene as Search in a BuletinBoard

2006-06-10 Thread Dominik Bruhn
Hy, Im writing a kind of bulletin-board-software in java (servlet+velocity as template framework, mysql5 as backend). As the whole database is in INNODB and this doesn't support Fulltext-Indexes I looked for alternatives and liked the concept of Lucene. But I got serveral questions: 1. Everybod

Re: Aggregating category hits

2006-06-10 Thread Yonik Seeley
On 6/10/06, z shalev <[EMAIL PROTECTED]> wrote: "the number of facets to check per request average somewhere between 100 and 200 (the total number of unique facets is much larger though). " you mean 100 - 200 different catagories to facet? I was going by memory, but 100 to 200 set inte

Re: Numbertools and efficient sorting

2006-06-10 Thread Benjamin Stein
On 6/9/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: : I have an integer field that I've indexed after converting to a string : using NumberTools.longToString(). : Now I want to sort my results using this field. Everything works when : treating the field as a string, but is very slow and memor

Re: Lucene as Search in a BuletinBoard

2006-06-10 Thread Otis Gospodnetic
Hi Dominik, I think most of your questions are answered in the Lucene FAQ and various Lucene articles, so I'll be brief. - Original Message From: Dominik Bruhn <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Saturday, June 10, 2006 6:58:03 PM Subject: Lucene as Search in a Bul

Re: Re[2]: Fwd: Lucene 2.0.0 release available

2006-06-10 Thread Otis Gospodnetic
It's really just a matter of putting the Jars in the appropriate directory on the appropriate machine, I think. Maybe CC-ed people know which dir/machine that is. The issue is: http://issues.apache.org/jira/browse/LUCENE-551 Otis - Original Message From: Sven Duzont <[EMAIL PROTECTED]>