On Wed, Aug 19, 2015, at 07:13 PM, Toke Eskildsen wrote: > Troy Edwards <tedwards415...@gmail.com> wrote: > > My average document size is 400 bytes > > Number of documents that need to be inserted 250000/second > > (for a total of about 3.6 Billion documents) > > > Any ideas/suggestions on how that can be done? (use a client > > or uploadcsv or stream or data import handler) > > Use more than one cloud. Make them fully independent. As I suggested when > you asked 4 days ago. That would also make it easy to scale: Just measure > how much a single setup can take and do the math.
Yes - work out how much each node can handle, then you can work out how many nodes you need. You could consider using implicit routing rather than compositeId, which means that you take on responsibility for hashing your ID to push content to the right node. (Or, if you use compositeId, you could use the same algorithm, and be sure that you send docs directly to the correct shard. At the moment, if you push five documents to a five shard collection, the node you send them to could end up doing four HTTP requests to the other nodes in the collection. This means you don't need to worry about where to post your content - it is just handled for you. However, there is a performance hit there. Push content direct to the correct node (either using implicit routing, or by replicating the compositeId hash calculation in your client) and you'd increase your indexing throughput significantly, I would theorise. Upayavira