Thanks Joshua, It works as i expected now.
On Friday, September 4, 2015, Joshua Martell <[email protected]> wrote: > The shuffle() breaks the stream and disassociates the parallelism setting > for the stream below it. > > Use this: > > topology.newStream("msg",kafkaSpout) > .parallelismHint(5) // Spout parallelism > .shuffle() > .each(new Fields("str"),new JsonDecode(), new > Fields("user_id","user_name")) > .parallelismHint(10); // Bolt parallelism > > > Joshua > > On Thu, Sep 3, 2015 at 6:02 AM, trung kien <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >> Hi Tom, >> >> Thanks for your help. >> >> - Do you see messages in all kafka partitions? >> >> Yes, i do see the messages in all kafka partitions. >> >> - How large are the messages in kb >> >> It's small only about 120 bytes per message. >> >> - Do you need exactly-once processing? If yes use Trident if not, use >> vanilla storm ( >> http://stackoverflow.com/questions/15520993/storm-vs-trident-when-not-to-use-trident >> ) >> >> Unfortunately i need exactly one processing. >> >> Do you have any immediate thoughts abou this? >> >> Sent from my HTC >> >> ----- Reply message ----- >> From: "Ziemer, Tom" <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> >> To: "[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>" < >> [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> >> Subject: How to have multiple storm workers consume a kafka topic in >> parallel >> Date: Thu, Sep 3, 2015 3:12 PM >> >> Hi Kien, >> >> I don’t see any immediate issue with the setup – some thoughts: >> >> - Do you see messages in all kafka partitions? >> >> - How large are the messages in kb? >> >> - Do you need exactly-once processing? If yes use Trident if >> not, use vanilla storm ( >> http://stackoverflow.com/questions/15520993/storm-vs-trident-when-not-to-use-trident >> ) >> >> Since you do not specify the partitions explicitly but use ZK instead, >> the spout should be able to pick it up from there. >> >> Regards, >> Tom >> >> From: trung kien [mailto:[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>] >> Sent: Donnerstag, 3. September 2015 09:07 >> To: [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');> >> Subject: Re: How to have multiple storm workers consume a kafka topic in >> parallel >> >> Hi Tom, >> >> Yes, i have my topic partitioned. >> I created the topic with --partitions 10 >> >> Here is how i create my KafkaSpout: >> >> BrokerHosts zk = ZkHosts("zkserver"); >> TridentKafkaConfig spoutConfig = new TridentKafkaConfig(zk, "my_queue"); >> spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme()); >> OpaqueTridentKafkaSpout kafkaSpout= new >> OpaqueTridentKafkaSpout(spoutConf); >> >> And im using this config like following: >> >> TridentTopology topology = new TridentTopology(); >> >> topology.newStream("myStream",kafkaSpout).shuffle().each(new >> Fields("str"), new JsonDecode(), new >> Fields("user_id","action")).parallelismHint(10); >> >> With this setting it only can handle arround 20k mesaages per sec. >> >> However, i want a lot more ( ~ 100k per sec). >> >> On the storm UI i only see 1 executors for the spout. >> >> Is there any config i can turn for greater performance here? >> >> Sent from my HTC >> >> ----- Reply message ----- >> From: "Ziemer, Tom" <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');><mailto: >> [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>>> >> To: "[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');><mailto: >> [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>>" < >> [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');><mailto: >> [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>>> >> Subject: How to have multiple storm workers consume a kafka topic in >> parallel >> Date: Thu, Sep 3, 2015 12:59 PM >> >> Hi, >> >> is your kafka topic partitioned? >> >> See: >> http://stackoverflow.com/questions/17205561/data-modeling-with-kafka-topics-and-partitions >> >> How is KafkaSpout configured? >> >> Regards, >> Tom >> >> From: trung kien [mailto:[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');><mailto: >> [email protected] <javascript:_e(%7B%7D,'cvml','[email protected]');>>] >> Sent: Mittwoch, 2. September 2015 09:05 >> To: [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');><mailto: >> [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> >> Subject: How to have multiple storm workers consume a kafka topic in >> parallel >> >> Hi Storm Users, >> >> I am new with Storm and using Trident for my applications. >> >> My application needs to push large of message into Kafka (in Json >> format), do some calculations and save the result in Redis. >> >> It seems that storm always assign only 1 worker for consuming the Kafka >> topic (even I have .parallelismhint(5) and my Storm cluster have 10 workers) >> Is there any way to have more than one worker consume a Kafka queue in >> parallel? >> >> Here is my topology code: >> >> topology.newStream("msg",kafkaSpout) >> .shuffle() >> .each(new Fields("str"),new JsonDecode(), new >> Fields("user_id","user_name")) >> .parallelismHint(5); >> >> Could someone please help me on this? only one worker is causing high >> latency in my application. >> >> -- >> Thanks >> Kien >> > > -- Thanks Kien
