Hi, On Tue, Sep 9, 2014 at 2:02 PM, Ron's Yahoo! <zlgonza...@yahoo.com> wrote:
> So I guess where I was coming from was the assumption that starting up a > new job to be listening on a particular queue topic could be done > asynchronously. > No, with the current state of Spark Streaming, all data sources and the processing pipeline must be fixed when you start your StreamingContext. You cannot add new data sources dynamically at the moment, see http://apache-spark-user-list.1001560.n3.nabble.com/Multi-tenancy-for-Spark-Streaming-Applications-td13398.html > For example, let’s say there’s a particular topic T1 in a Kafka queue. > If I have a new set of requests coming from a particular client A, I was > wondering if I could create a partition A. > The streaming job is submitted to listen to T1.A and will write to a > topic T2.A, which the REST endpoint would be listening on. > That doesn't seem like a good way to use Kafka. It may be possible, but I am pretty sure you should create a new topic T_A instead of a partition A in an existing topic. With some modifications of Spark Streaming's KafkaReceiver you *might* be able to get it to work as you imagine, but it was not meant to be that way, I think. Also, you will not get "low latency", because Spark Streaming processes data in batches of fixed interval length (say, 1 second) and in the worst case your query will wait up to 1 second before processing even starts. If I understand correctly what you are trying to do (which I am not sure about), I would probably recommend to choose a bit of a different architecture; in particular given that you cannot dynamically add data sources. Tobias