Re: Spark streaming and rate limit

2014-06-19 Thread Flavio Pompermaier
Yes, I need to call the external service for every event and the order does not matter. There's no time limit in which each events should be processed. I can't tell the producer to slow down nor drop events. Of course I could put a message broker in between like an AMQP or JMS broker but I was

Re: Spark streaming and rate limit

2014-06-19 Thread Michael Cutler
Hello Flavio, It sounds to me like the best solution for you is to implement your own ReceiverInputDStream/Receiver component to feed Spark Streaming with DStreams. It is not as scary as it sounds, take a look at some of the examples like TwitterInputDStream

Re: Spark streaming and rate limit

2014-06-19 Thread Flavio Pompermaier
Hi Michael, thanks for the tip, it's really an elegant solution. What I'm still missing here (maybe I should take a look at the code of TwitterInputDStream https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala..) is

Re: Spark streaming and rate limit

2014-06-19 Thread Michael Cutler
Hi Flavio, When your streaming job starts somewhere in the cluster the Receiver will be started in its own thread/process. You can do whatever you like within the receiver e.g. start and manage your own thread pool to fetch external data and feed Spark. If your Receiver dies Spark will

Re: Spark streaming and rate limit

2014-06-19 Thread Flavio Pompermaier
Ok, I'll try to start from that when I'll try to implement it. Thanks again for the great support! Best, Flavio On Thu, Jun 19, 2014 at 10:57 AM, Michael Cutler mich...@tumra.com wrote: Hi Flavio, When your streaming job starts somewhere in the cluster the Receiver will be started in its

Spark streaming and rate limit

2014-06-18 Thread Flavio Pompermaier
Hi to all, in my use case I'd like to receive events and call an external service as they pass through. Is it possible to limit the number of contemporaneous call to that service (to avoid DoS) using Spark streaming? if so, limiting the rate implies a possible buffer growth...how can I control the

Re: Spark streaming and rate limit

2014-06-18 Thread Soumya Simanta
You can add a back pressured enabled component in front that feeds data into Spark. This component can control in input rate to spark. On Jun 18, 2014, at 6:13 PM, Flavio Pompermaier pomperma...@okkam.it wrote: Hi to all, in my use case I'd like to receive events and call an external

Re: Spark streaming and rate limit

2014-06-18 Thread Flavio Pompermaier
Thanks for the quick reply soumya. Unfortunately I'm a newbie with Spark..what do you mean? is there any reference to how to do that? On Thu, Jun 19, 2014 at 12:24 AM, Soumya Simanta soumya.sima...@gmail.com wrote: You can add a back pressured enabled component in front that feeds data into

Re: Spark streaming and rate limit

2014-06-18 Thread Soumya Simanta
Flavio - i'm new to Spark as well but I've done stream processing using other frameworks. My comments below are not spark-streaming specific. Maybe someone who know more can provide better insights. I read your post on my phone and I believe my answer doesn't completely address the issue you have