Howdy Donald,

Indeed this PR trades batch prediction scalability (via Spark) for
compatibility (with all models). I'm not convinced this is good trade-off.

I also had a variation of the PR working that simply reuses the model's
SparkContext for queries, if it hasn't already been stopped. That's where I
found the underlying issue about nested model RDDs inside the queries RDD.

So if we decide to stay the course for scalability, how do we mitigate the
incompatibility for engines that use a custom PersistentModel containing
RDDs?

On Tue, Nov 21, 2017 at 11:28 AM, Donald Szeto <don...@apache.org> wrote:

> Hi Mars,
>
> Thanks for the PR! I am still reviewing the code change, but at the high
> level it will take away the ability to run "batchpredict" remotely on a
> Spark cluster + HDFS/S3 setup, and requires extra steps of downloading
> input and uploading output files for such setup. It will unlikely scale to
> much larger dataset.
>
> That said, this is a very important and convenient feature. I'll help make
> it as good as possible.
>
> Regards,
> Donald
>
> On Mon, Nov 20, 2017 at 10:21 AM, Mars Hall <mars.h...@salesforce.com>
> wrote:
>
>> Hi PIO folks!
>>
>> Curious to hear from anyone using the new (as of PredictionIO 0.12.0)
>> batch predict command.
>>
>>    - Do you use `pio batchpredict`?
>>    - What is your use-case? (Which Engine template or algorithm?)
>>    - Does it work for your use-case? (Details please)
>>
>> Batch predict is currently broken for engines using a custom
>> PersistentModel. I've been working on a fix for this issue:
>>   https://github.com/apache/incubator-predictionio/pull/447
>>
>> Because that changeset alters the way the command works, I'd appreciate
>> feedback on that pull request. *Especially if you use `pio batchpredict`
>> today*, please try it out, and post feedback to that Github issue.
>>
>> Thanks for your attention 😄
>>
>> --
>> *Mars Hall
>> Customer Facing Architect
>> Salesforce Platform / Heroku
>> San Francisco, California
>>
>
>


-- 
*Mars Hall
415-818-7039
Customer Facing Architect
Salesforce Platform / Heroku
San Francisco, California

Reply via email to