Hi Marcin,

Thanks for the reply. I've just spent a couple of days looking into
prediction.io tutorials, so forgive me if I missed something. :)

IMO there are several pieces that needs to work together to make
streaming work really well:
1. Online learning algorithm - MLlib provides 2 streaming algorithm
(linear regression and k-means). Arguably more algorithms would be
needed. This falls outside the prerogative of prediction.io, but it
could mean that prediction.io need to integrate with other platform
that provides online learning. I've been looking at other online
learning projects but haven't got a good grasp on the landscape of
this area.
2. Model training - it seems current prediction.io framework can be
tweaked a little to make this work with MLlib streaming algorithms?
Basically move model training to 'pio deploy' step, where the model is
trained using DStream.
3. Data ingestion - There is probably going to be 2 modes for online
training: store no data, or store data under a retention policy. We
need a real-time ingestion mechanism other than REST (e.g. Kafka) as
well.
4. Prediction - existing prediction API is relevant, but should also
consider proactive predictions (like suggesting anomalies in data) and
feedback mechanism. Perhaps we need "data sink" concept which can
proactively generate notifications.

Please let me know what you think.

Thanks! James

On Fri, Sep 30, 2016 at 4:01 AM, Marcin Ziemiński <[email protected]> wrote:
> Hi James,
>
> Incorporating Spark Streaming or Structured Streaming in PredictionIO will
> probably involve significant changes in the architecture. We are currently
> in the state of rethinking the design, so that it could enable different
> approaches of processing data. Future releases (after 0.10) should bring
> some changes, I hope that introducing streaming will be one of them.
> Do you have any thoughts on how you imagine putting stream processing into
> PredictionIO and using it this way? Any input in this matter would be of a
> great value.
>
> Thanks,
> Marcin
>
> pt., 30.09.2016 o 01:02 użytkownik James Wu <[email protected]> napisał:
>>
>> Hi,
>>
>> Are there any plans for prediction.io to integrate with Spark
>> Streaming and support the streaming algorithms in MLlib?
>>
>> Thanks! James

Reply via email to