RE: Fastest batch indexing with 1.3-rc1

2003-08-20 Thread Dan Quaroni
Looks like I spoke too soon... As the index gets larger, time to merge becomes prohibitably high. It appears to increase linearly. Oh well. I guess I'll just have to go with about 3ms/doc. - To unsubscribe, e-mail: [EMAIL

Re: Fastest batch indexing with 1.3-rc1

2003-08-20 Thread Doug Cutting
As the index grows, disk i/o becomes the bottleneck. The default indexing parameters do a pretty good job of optimizing this. But if you have lots of CPUs and lots of disks, you might try building several indexes in parallel, each containing a subset of the documents, optimize each index and

Re: Fastest batch indexing with 1.3-rc1

2003-08-20 Thread Leo Galambos
Isn't it better for Dan to skip the optimization phase before merging? I am not sure, but he could save some time on this (if he has enough file handles for that, of course). What strategy do you use in nutch? THX -g- Doug Cutting wrote: As the index grows, disk i/o becomes the bottleneck.