Looks like I spoke too soon... As the index gets larger, time to merge
becomes prohibitably high. It appears to increase linearly.
Oh well. I guess I'll just have to go with about 3ms/doc.
-
To unsubscribe, e-mail: [EMAIL
As the index grows, disk i/o becomes the bottleneck. The default
indexing parameters do a pretty good job of optimizing this. But if you
have lots of CPUs and lots of disks, you might try building several
indexes in parallel, each containing a subset of the documents, optimize
each index and
Isn't it better for Dan to skip the optimization phase before merging? I
am not sure, but he could save some time on this (if he has enough file
handles for that, of course). What strategy do you use in nutch?
THX
-g-
Doug Cutting wrote:
As the index grows, disk i/o becomes the bottleneck.