Re: Ack not being called

2015-08-17 Thread Stuart Perks
Worked it out. Removing a for loop in the spout wrapped around the emit fixed it. Any ideas why this makes a different? > On 18 Aug 2015, at 06:12, Abhishek Agarwal wrote: > > Couple of questions - > 1. Are you adding the tuples to pendingTuple list before emitting them in the > list? S

Re: Is the worker send thread a possible bottleneck?

2015-08-17 Thread Kishore Senji
I think it is the same even in local shuffling. It will go to the worker receive buffer from which it gets transferred to the executor disruptor queue. So yes the executor would be able to go as fast that thread is able to tuples, provided the Bolt takes less time than copy of the messages from the

Re: how to use storm to identify the missing records in data stream

2015-08-17 Thread Abhishek Agarwal
Kafka producer also allows you to chose a custom partitioning strategy. You can use the same what you used in the consumer. On Tue, Aug 18, 2015 at 12:19 AM, Alec Lee wrote: > Hello, thanks for the reply. > > Now I am getting some new issues, see > > "c1bdeeb9-0309-4dae-9d6c-56406796528a,febc005

Re: Ack not being called

2015-08-17 Thread Abhishek Agarwal
Couple of questions - 1. Are you adding the tuples to pendingTuple list before emitting them in the list? Since I didn't see that in the code. 2. Is logging correctly configured? Can you use sysout instead of log.info and then try out. On Tue, Aug 18, 2015 at 4:02 AM, Stuart Perks wrote: > Set t

storm-kafka

2015-08-17 Thread Jinhong Lu
I read data from kafka with storm-kafka trident. But no matter how I change the partitions, num of worker, num of threads, I just get about 3 million messages from kafka(about 2G). what can I do to increase it ? Any configuration can help? thanks. BR//lujinhong

Re: Ack not being called

2015-08-17 Thread Stuart Perks
Set to 23 the same number as the workers are set to. thanks > On 17 Aug 2015, at 23:04, Javier Gonzalez wrote: > > How many ackers have you got configured when you submit your topology? > > On Aug 17, 2015 5:57 PM, "Stuart Perks" > wrote: > Hi I am attempting to ru

RE: worker dies after view minutes

2015-08-17 Thread Eric Ruel
finally, I am able to increase the parallelism and keep my worker alive in fact the problem was not the parallellism but the number of tasks... I had to set in the parameters of zookeeper and the nimbus -Djute.maxbuffer=33554432 a lower value would probably works to the nimbus disallowed my wo

Re: Ack not being called

2015-08-17 Thread Javier Gonzalez
How many ackers have you got configured when you submit your topology? On Aug 17, 2015 5:57 PM, "Stuart Perks" wrote: > Hi I am attempting to run guaranteed message processing but ACK is not > being called. Post on stack overflow if you prefer answer there. > http://stackoverflow.com/questions/32

Ack not being called

2015-08-17 Thread Stuart Perks
Hi I am attempting to run guaranteed message processing but ACK is not being called. Post on stack overflow if you prefer answer there. http://stackoverflow.com/questions/32060081/apache-storm-ack-not-working Thanks 0

Re: how to use storm to identify the missing records in data stream

2015-08-17 Thread Alec Lee
Hello, thanks for the reply. Now I am getting some new issues, see "c1bdeeb9-0309-4dae-9d6c-56406796528a,febc005f,2013-03-15 05:15:00-07:00,60,0.3480,2013-03-26 18:15:21.173000-07:00,7738739" "d2e9128a-2800-4dac-b3bc-e88de2bb6e12,fef00032,2013-03-24 09:12:00-07:00,60,1.4280,2013-03-26 14:48:

Re: Using localOrShuffleGrouping--any suggested optimizations?

2015-08-17 Thread John Yost
Bumped up number of ackers to 100, which made a *huge* difference--4.3/4.4 million to 6.6 million tuples acked/minute! The capacity of my acker executors was down around 0.15, so I did not figure I needed to increase from 10 to 100, but wowsers, that one change made a major impact. Thanks again to

Re: Using localOrShuffleGrouping--any suggested optimizations?

2015-08-17 Thread John Yost
Hi Kobi, Cool, thanks for getting back to me so quickly! I did confirm that there's one instance of Bolt A (sender, 400 executors) and Bolt B (receiver, 100 executors) on each worker (100 workers in topology), so we should be good with local shuffling working. I only have 10 ackers, so I'll bump

Re: Using localOrShuffleGrouping--any suggested optimizations?

2015-08-17 Thread Kobi Salant
Hi John, You should make sure you have at least an instance of each bolt on each worker so local shuffling will work. Also, the number of ackers should be according to the number of workers. Did you check the capacity of the bolts and ackers? Kobi On Mon, Aug 17, 2015 at 7:22 PM, John Yost wro

Using localOrShuffleGrouping--any suggested optimizations?

2015-08-17 Thread John Yost
Hi Everyone, I updated my topology to use localOrShuffleGrouping for a Bolt that, for each incoming tuple, the Bolt generates and emits 15-20 tuples. My throughput went from 1 M tuples acked/minute to 4.5 million, which is great, but I need to get to 7-8 million tuples acked/minute. Question--ar

Is the worker send thread a possible bottleneck?

2015-08-17 Thread Vincenzo Gulisano
Hi, something that is not clear to me (and I cannot find clear answers to this), is whether the send thread of an executor A is able to directly move a tuple to the input queue of another executor B (within the same worker of course) or whether all output tuples have to go through the shared transf

How do I get the capacity of a Spout?

2015-08-17 Thread Vincenzo Gulisano
Hi, I would like to know if there's a way (from the UI or the metrics logger) to know the capacity of a spout operator (currently the UI shows it only for bolts) Thanks! Vincenzo

Re: Trident DRPC not distributing task to all workers

2015-08-17 Thread satyavrat
Hi all, Please help. On Jul 27, 2015 2:25 PM, "satyavrat" wrote: > Hi, > > I am using DRPC for my log processing. > We have a very heavy task and trying to do it with drpc. > Input to drpc is a file name , output is usually 1GB string data. > > Problem : DRPC is not assigning task to all workers