subject:"python vs java bulk indexing"

Re: python vs java bulk indexing

2014-03-29 Thread joergpra...@gmail.com

If you run 16 python processes, why do you run 20 Java threads and not 16? Most important is the bulk action size (how many requests are sent) and the concurrency (how many bulk requests are active), also the bulk request volume. I recommend to control the concurrency, your code does not do it. I

Re: python vs java bulk indexing

2014-03-29 Thread eunever32

By the way I can successfully run 16 python processes no problem. So the server can handle concurrent bulk requests. The problem is with my java code as it somehow starts threads indefinitely -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To

Re: python vs java bulk indexing

2014-03-29 Thread eunever32

Guys I appreciate the suggestions But shouldn't actionget() block ? So there should only be 20 threads (maybe another 20 for ES) I mean we're saying client threads are just being for each bulk request ? How does it work for other applications? I notice search has options singlethread no thread Is

Re: python vs java bulk indexing

2014-03-28 Thread InquiringMind

Yes, that is sufficient to clear out the documents. But... take the advice given by Jörg to heart. Elasticsearch is already optimized to take a bulk request and optimally process it as fast as it can be done. There should not be more than one of them at a time; no gain will be seen, and (as you

Re: python vs java bulk indexing

2014-03-28 Thread joergpra...@gmail.com

Your code has no precautions against overwhelming the cluster. 20 worker threads that are not coordinated is a challenge. I recommend the BulkProcessor class at https://github .com/elasticsearch/elasticsearch/blob/master/src /main/java/org/elasticsearch/action/bulk/BulkProcessor.java SYN flood me

Re: python vs java bulk indexing

2014-03-28 Thread eunever32

You could be right: I can't test right now but this is my code: (there may be 20 workerThreads) As you can see, as each thread submits work, the thread will do a client.prepareBulk() ... is that sufficient clear out the documents? workerThread() { Client client = getMyGlobalTransportClient();

Re: python vs java bulk indexing

2014-03-28 Thread InquiringMind

When I use the Java TransportClient and the BulkRequest builder, my throughput is like a scalded cat racing a bolt of greased lightning, with the cat way ahead! "the Java API" does not say how you are using it. Since I cannot see your code, I cannot comment on where your mistake is located. Bu

Re: python vs java bulk indexing

2014-03-28 Thread eunever32

If it's any help, this is the error when the threads start to hang: 2014-03-28 13:34:39,845 [elasticsearch[Cerberus][transport_client_worker][T#16]{New I/O worker #2832}] (Log4jESLogger.java:129) WARN org.elasticsearch.netty.channel.socket.nio.AbstractNioSelector - Unexpected exception in the

python vs java bulk indexing

2014-03-28 Thread eunever32

Hi, When running the bulk indexing with python everything works fine.. good solid throughput for the full indexing run. When doing the same with the Java api what is happening is that thousands of client threads are being created (7000) And the server stops indexing and then the client just ha

Re: python vs java bulk indexing

Re: python vs java bulk indexing

Re: python vs java bulk indexing

Re: python vs java bulk indexing

Re: python vs java bulk indexing

Re: python vs java bulk indexing

Re: python vs java bulk indexing

Re: python vs java bulk indexing

python vs java bulk indexing

9 matches

Site Navigation

Mail list logo

Footer information