Hi,
The key is to find the ideal bulk size and the ideal bulk request
concurrency level, and then make sure the client always feeds ES enough
data to achieve (close to) ideal utilization and minimize idling on either
side.
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr
HTTP overhead is miniscule compared to the server-side (elasticsearch)
resources which are required to index the documents. Even with bulk and no
streaming etc. the bottleneck is in building the index and in particular,
disk I/O (primarily) as well as cpu and memory.
So, regardless, your client wi
The difference between stream_bulk and regular bulk is just in the
API, under the hood they both perform the same operation. The only
difference is that bulk only returns once all documents have been sent
whereas streaming_bulk is a generator that keeps yielding individual
results.
On Sun, Mar 2,
so excuse me Honza.. am I correct thinking there is no point from a
performance perspective calling helpers.bulk because it will just be
"sliced" into the chunk size by streaming bulk anyway.
It would make more sense to call helpers.streaming_bulk directly to reduce
the client side activity?
A
Well, you can just use any async http library to do it, but I wouldn't
recommend it since putting it all back together might be difficult (to
see which documents failed to index etc). You can always just have a
couple of threads each running a streaming bulk, reading from a Queue
and writing the re
Hey thanks,
So is there a convenient way to asynchronously call the bulk (helpers bulk
or helpers streaming_bulk) in a way that means the client isn't waiting for
the request to complete?
On Sunday, March 2, 2014 5:51:15 PM UTC, Honza Král wrote:
>
> Hi,
>
> the streaming_bulk function in elas
Hi,
the streaming_bulk function in elasticsearch-py is a helper that will
actually split the stream of documents into chunk and send them to
elasticsearch - it does not stream all documents to es as a single
request. It is impossible (due to the nature of bulk requests) for
elasticsearch to consum
Hi,
I have a similar question as the OP : what is the best way to get 1m or 30m
records indexed?
I mean I can send client.bulk batches of records but while the request is
being indexed the client is waiting: valuable seconds.
Also I have tried: python elasticsearch-py and there is a helpers.bul
You are correct, ES nodes consumes data request by request, before they are
passed on through the cluster. Also the bulk indexing requests, such
requests are temporarily pushed to buffers, but they are split by lines and
executed as single actions.
So to reduce network roundtrips, the best thing i
Let's say I have a million documents I want to index. I am aware that you
can index 100 documents at a time or 1000 at a time using the bulk API.
However I could also write my HTTP client to stream all one million
documents as bytes with a single bulk API call. This would be advantageous
becaus
10 matches
Mail list logo