Serialisation across workers might be your problem, if you can use the
"localOrShuffle" grouping and arrange that the number of spouts and bolts
is a multiple of the number of workers then this will minimise the
serialisation across workers. If there is only one counting bolt for the
topology then tuples are serialised and sent to the worker with the single
counting bolt. A better approach might be to have a single counting bolt
per worker and aggregate those periodically.

Regards
   Rob Turner.


On 24 June 2014 15:10, <jamesw...@yahoo.com.tw> wrote:

> Hi all,
>
> I face a critical problem about performance of my storm topology. I can
> only process 1000 tuples/sec from kafka by kafkaSpout. I use standard storm
> to set my topology(not trident), and my topology information is as follows:
> [Machines]
> I have 1 nimbus and 3 supervisors and each with 2-core CPU in GCE(google
> compute engine)
> Number of workers:12
> Number of executers:51
> [Topology]
> Number of kafkaSpout: 13(fetch 13 topics from kafka brokers)
> Number of Bolts: 12(There are 5 mysql-dumper bolt here)
>
> KafkaSpout(topic) emits to boltA and boltB
> boltA(parallelism=9): parse the avro tuple from kafkaSpout
> boltB(parallelism=1): Counting number of bolt only
>
> Ifound sometimes boltA's capacity is 1 or above in storm UI, and my 5
> mysql-dumper bolt's execute latency is more than 300ms(other bolts are less
> than 10ms). In addition, my complete latency of these kafkaspouts is more
> than 2000ms in the beggining, but it drops to 1000ms after a while.
>
> I found this topology can only process 1000 tuples/s or less, but my goal
> is to process 10000 tuples/s. Is any wrong of my topology config? Actually,
> my topology is doing simple thing like counting and dumping to mysql only.
> It seems storm not to have a good performance as it says(million of tuples
> in a second in 10-node cluster). Can anyone give me some suggestion?
>
> Thanks a lot.
>
> Best regards,
> James




-- 
Cheers
   Rob.

Reply via email to