Re: Indexing large sets of documents?

2006-07-28 Thread Rafael Rossini
() - Original Message From: Rafael Rossini [EMAIL PROTECTED] To: java-user@lucene.apache.org; Otis Gospodnetic [EMAIL PROTECTED] Sent: Thursday, July 27, 2006 4:23:56 PM Subject: Re: Indexing large sets of documents? Oits, You mentioned the hadoop project. I check it out not a long time ago and I

Re: Indexing large sets of documents?

2006-07-27 Thread MALCOLM CLARK
Is this the W3 Ent collection you are indexing? MC

RE: Indexing large sets of documents?

2006-07-27 Thread Dejan Nenov
Yes - parallelizing works great - we built a share-nothing java-spaces based system at X1 and on a 11-way cluster were able to index 350 office documents per second - this included the binary-2-text conversion, using Stellent INSO libraries. The trick is to create separate indexes and, if you do

Re: Indexing large sets of documents?

2006-07-27 Thread Otis Gospodnetic
Michael, Certainly parallelizing on a set of servers would work (hmm... hadoop?), but if you want to do this on a single machine you should tune some of the IndexWriter params. You didn't mention them, so I assume you didn't tune anything yet. If you have Lucene in Action, check out 2.7.1

Re: Indexing large sets of documents?

2006-07-27 Thread Rafael Rossini
Oits, You mentioned the hadoop project. I check it out not a long time ago and I read someting about it did not support the lucene index. Is it possible to index and then search in a HDFS? []s Rossini On 7/27/06, Otis Gospodnetic [EMAIL PROTECTED] wrote: Michael, Certainly