Streaming results of analysis to shards ... possible?

Cass Costello Tue, 24 Mar 2009 10:03:00 -0700

Hello all,

Our application involves a high index write rate - anywhere from a few
dozen to many thousands of docs per sec.  The write rate is frequently
higher than the read rate (though not always), and our index must be
as fresh as possible (we'd like search results to be no more than a
couple of seconds out of date). We're considering many approaches to
achieving our desired TCO.


We've noted that the indexing process can be quite costly.  Our latest
POC shards the total index over N machines which effectively
distributes the indexing load and keeps refresh and and search
response times decent, but to maintain performance during peak write
rates, we've had to make N a much larger number than we'd like.

One idea we're floating would be to do all the analysis centrally,
perhaps on N/4 machines, and then stream the raw tokens and data
directly to the read "slaves," who would (hopefully) need to do
nothing more than manage segments and readers.

We have some very rough math that makes the approach compelling, but
before diving in wholesale, we thought we'd ask if anyone else has
taken a similar approach.   Thoughts?

Sincerely,

Cass Costello
www.stubhub.com

Streaming results of analysis to shards ... possible?

Reply via email to