Re: python vs java bulk indexing

2014-03-29 Thread joergpra...@gmail.com
If you run 16 python processes, why do you run 20 Java threads and not 16? Most important is the bulk action size (how many requests are sent) and the concurrency (how many bulk requests are active), also the bulk request volume. I recommend to control the concurrency, your code does not do it. I

Re: python vs java bulk indexing

2014-03-29 Thread eunever32
By the way I can successfully run 16 python processes no problem. So the server can handle concurrent bulk requests. The problem is with my java code as it somehow starts threads indefinitely -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To

Re: python vs java bulk indexing

2014-03-29 Thread eunever32
Guys I appreciate the suggestions But shouldn't actionget() block ? So there should only be 20 threads (maybe another 20 for ES) I mean we're saying client threads are just being for each bulk request ? How does it work for other applications? I notice search has options singlethread no thread Is

Re: python vs java bulk indexing

2014-03-28 Thread InquiringMind
Yes, that is sufficient to clear out the documents. But... take the advice given by Jörg to heart. Elasticsearch is already optimized to take a bulk request and optimally process it as fast as it can be done. There should not be more than one of them at a time; no gain will be seen, and (as you

Re: python vs java bulk indexing

2014-03-28 Thread joergpra...@gmail.com
Your code has no precautions against overwhelming the cluster. 20 worker threads that are not coordinated is a challenge. I recommend the BulkProcessor class at https://github .com/elasticsearch/elasticsearch/blob/master/src /main/java/org/elasticsearch/action/bulk/BulkProcessor.java SYN flood me

Re: python vs java bulk indexing

2014-03-28 Thread eunever32
You could be right: I can't test right now but this is my code: (there may be 20 workerThreads) As you can see, as each thread submits work, the thread will do a client.prepareBulk() ... is that sufficient clear out the documents? workerThread() { Client client = getMyGlobalTransportClient();

Re: python vs java bulk indexing

2014-03-28 Thread InquiringMind
When I use the Java TransportClient and the BulkRequest builder, my throughput is like a scalded cat racing a bolt of greased lightning, with the cat way ahead! "the Java API" does not say how you are using it. Since I cannot see your code, I cannot comment on where your mistake is located. Bu

Re: python vs java bulk indexing

2014-03-28 Thread eunever32
If it's any help, this is the error when the threads start to hang: 2014-03-28 13:34:39,845 [elasticsearch[Cerberus][transport_client_worker][T#16]{New I/O worker #2832}] (Log4jESLogger.java:129) WARN org.elasticsearch.netty.channel.socket.nio.AbstractNioSelector - Unexpected exception in the

python vs java bulk indexing

2014-03-28 Thread eunever32
Hi, When running the bulk indexing with python everything works fine.. good solid throughput for the full indexing run. When doing the same with the Java api what is happening is that thousands of client threads are being created (7000) And the server stops indexing and then the client just ha