On Wed, Aug 19, 2015, at 07:13 PM, Toke Eskildsen wrote:
> Troy Edwards <tedwards415...@gmail.com> wrote:
> > My average document size is 400 bytes
> > Number of documents that need to be inserted 250000/second
> > (for a total of about 3.6 Billion documents)
> 
> > Any ideas/suggestions on how that can be done? (use a client
> > or uploadcsv or stream or data import handler)
> 
> Use more than one cloud. Make them fully independent. As I suggested when
> you asked 4 days ago. That would also make it easy to scale: Just measure
> how much a single setup can take and do the math.

Yes - work out how much each node can handle, then you can work out how
many nodes you need.

You could consider using implicit routing rather than compositeId, which
means that you take on responsibility for hashing your ID to push
content to the right node. (Or, if you use compositeId, you could use
the same algorithm, and be sure that you send docs directly to the
correct shard.

At the moment, if you push five documents to a five shard collection,
the node you send them to could end up doing four HTTP requests to the
other nodes in the collection. This means you don't need to worry about
where to post your content - it is just handled for you. However, there
is a performance hit there. Push content direct to the correct node
(either using implicit routing, or by replicating the compositeId hash
calculation in your client) and you'd increase your indexing throughput
significantly, I would theorise.

Upayavira

Reply via email to