John Bickerstaff <j...@johnbickerstaff.com> wrote:
> As an aside - I just spoke with somone the other day who is using Hadoop
> for re-index in order to save a lot of time.

If you control which documents goes into which shards, then that is certainly a 
possibility. We have a collection with long re-indexing time (about 20 CPU-core 
years), but are able to build the shards independently of each other, so it 
scales near-perfect with more hardware. The cheat is that our documents are 
never updated, so everything is always new and just appended to the latest 
shard being build. We don't use Hadoop, but the principle is the same.

- Toke Eskildsen

Reply via email to