Re: Spark streaming for synchronous API

Tobias Pfeiffer Mon, 08 Sep 2014 22:21:07 -0700

Hi,

On Tue, Sep 9, 2014 at 2:02 PM, Ron's Yahoo! <zlgonza...@yahoo.com> wrote:


>   So I guess where I was coming from was the assumption that starting up a
> new job to be listening on a particular queue topic could be done
> asynchronously.
>

No, with the current state of Spark Streaming, all data sources and the
processing pipeline must be fixed when you start your StreamingContext. You
cannot add new data sources dynamically at the moment, see
http://apache-spark-user-list.1001560.n3.nabble.com/Multi-tenancy-for-Spark-Streaming-Applications-td13398.html


>   For example, let’s say there’s a particular topic T1 in a Kafka queue.
> If I have a new set of requests coming from a particular client A, I was
> wondering if I could create a partition A.
>   The streaming job is submitted to listen to T1.A and will write to a
> topic T2.A, which the REST endpoint would be listening on.
>

That doesn't seem like a good way to use Kafka. It may be possible, but I
am pretty sure you should create a new topic T_A instead of a partition A
in an existing topic. With some modifications of Spark Streaming's
KafkaReceiver you *might* be able to get it to work as you imagine, but it
was not meant to be that way, I think.

Also, you will not get "low latency", because Spark Streaming processes
data in batches of fixed interval length (say, 1 second) and in the worst
case your query will wait up to 1 second before processing even starts.

If I understand correctly what you are trying to do (which I am not sure
about), I would probably recommend to choose a bit of a different
architecture; in particular given that you cannot dynamically add data
sources.

Tobias

Re: Spark streaming for synchronous API

Reply via email to