()
- Original Message
From: Rafael Rossini [EMAIL PROTECTED]
To: java-user@lucene.apache.org; Otis Gospodnetic
[EMAIL PROTECTED]
Sent: Thursday, July 27, 2006 4:23:56 PM
Subject: Re: Indexing large sets of documents?
Oits,
You mentioned the hadoop project. I check it out not a long time ago and
I
Is this the W3 Ent collection you are indexing?
MC
Yes - parallelizing works great - we built a share-nothing java-spaces based
system at X1 and on a 11-way cluster were able to index 350 office documents
per second - this included the binary-2-text conversion, using Stellent INSO
libraries. The trick is to create separate indexes and, if you do
Michael,
Certainly parallelizing on a set of servers would work (hmm... hadoop?), but if
you want to do this on a single machine you should tune some of the IndexWriter
params. You didn't mention them, so I assume you didn't tune anything yet. If
you have Lucene in Action, check out
2.7.1
Oits,
You mentioned the hadoop project. I check it out not a long time ago and
I read someting about it did not support the lucene index. Is it possible to
index and then search in a HDFS?
[]s
Rossini
On 7/27/06, Otis Gospodnetic [EMAIL PROTECTED] wrote:
Michael,
Certainly