What is called Bolt in Storm is essentially a combination of [Transformation/Action and DStream RDD] in Spark – so to achieve a higher parallelism for specific Transformation/Action on specific Dstream RDD simply repartition it to the required number of partitions which directly relates to the corresponding number of Threads
From: anshu shukla [mailto:anshushuk...@gmail.com] Sent: Wednesday, May 6, 2015 9:33 AM To: ayan guha Cc: user@spark.apache.org; d...@spark.apache.org Subject: Re: Creating topology in spark streaming But main problem is how to increase the level of parallelism for any particular bolt logic . suppose i want this type of topology . https://storm.apache.org/documentation/images/topology.png How we can manage it . On Wed, May 6, 2015 at 1:36 PM, ayan guha <guha.a...@gmail.com> wrote: Every transformation on a dstream will create another dstream. You may want to take a look at foreachrdd? Also, kindly share your code so people can help better On 6 May 2015 17:54, "anshu shukla" <anshushuk...@gmail.com> wrote: Please help guys, Even After going through all the examples given i have not understood how to pass the D-streams from one bolt/logic to other (without writing it on HDFS etc.) just like emit function in storm . Suppose i have topology with 3 bolts(say) BOLT1(parse the tweets nd emit tweet using given hashtags)=====>Bolt2(Complex logic for sentiment analysis over tweets)=======>BOLT3(submit tweets to the sql database using spark SQL) Now since Sentiment analysis will take most of the time ,we have to increase its level of parallelism for tuning latency. Howe to increase the levele of parallelism since the logic of topology is not clear . -- Thanks & Regards, Anshu Shukla Indian Institute of Sciences -- Thanks & Regards, Anshu Shukla