Re: Does the server support streaming?

2014-03-11 Thread Otis Gospodnetic
Hi, The key is to find the ideal bulk size and the ideal bulk request concurrency level, and then make sure the client always feeds ES enough data to achieve (close to) ideal utilization and minimize idling on either side. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr

Re: Does the server support streaming?

2014-03-03 Thread Randall McRee
HTTP overhead is miniscule compared to the server-side (elasticsearch) resources which are required to index the documents. Even with bulk and no streaming etc. the bottleneck is in building the index and in particular, disk I/O (primarily) as well as cpu and memory. So, regardless, your client wi

Re: Does the server support streaming?

2014-03-02 Thread Honza Král
The difference between stream_bulk and regular bulk is just in the API, under the hood they both perform the same operation. The only difference is that bulk only returns once all documents have been sent whereas streaming_bulk is a generator that keeps yielding individual results. On Sun, Mar 2,

Re: Does the server support streaming?

2014-03-02 Thread eunever32
so excuse me Honza.. am I correct thinking there is no point from a performance perspective calling helpers.bulk because it will just be "sliced" into the chunk size by streaming bulk anyway. It would make more sense to call helpers.streaming_bulk directly to reduce the client side activity? A

Re: Does the server support streaming?

2014-03-02 Thread Honza Král
Well, you can just use any async http library to do it, but I wouldn't recommend it since putting it all back together might be difficult (to see which documents failed to index etc). You can always just have a couple of threads each running a streaming bulk, reading from a Queue and writing the re

Re: Does the server support streaming?

2014-03-02 Thread eunever32
Hey thanks, So is there a convenient way to asynchronously call the bulk (helpers bulk or helpers streaming_bulk) in a way that means the client isn't waiting for the request to complete? On Sunday, March 2, 2014 5:51:15 PM UTC, Honza Král wrote: > > Hi, > > the streaming_bulk function in elas

Re: Does the server support streaming?

2014-03-02 Thread Honza Král
Hi, the streaming_bulk function in elasticsearch-py is a helper that will actually split the stream of documents into chunk and send them to elasticsearch - it does not stream all documents to es as a single request. It is impossible (due to the nature of bulk requests) for elasticsearch to consum

Re: Does the server support streaming?

2014-03-02 Thread eunever32
Hi, I have a similar question as the OP : what is the best way to get 1m or 30m records indexed? I mean I can send client.bulk batches of records but while the request is being indexed the client is waiting: valuable seconds. Also I have tried: python elasticsearch-py and there is a helpers.bul

Re: Does the server support streaming?

2014-01-08 Thread joergpra...@gmail.com
You are correct, ES nodes consumes data request by request, before they are passed on through the cluster. Also the bulk indexing requests, such requests are temporarily pushed to buffers, but they are split by lines and executed as single actions. So to reduce network roundtrips, the best thing i

Does the server support streaming?

2014-01-08 Thread Ryan Pedela
Let's say I have a million documents I want to index. I am aware that you can index 100 documents at a time or 1000 at a time using the bulk API. However I could also write my HTTP client to stream all one million documents as bytes with a single bulk API call. This would be advantageous becaus