Re: PredictionIO spark deployment in Production

Pat Ferrel Tue, 07 Aug 2018 10:01:00 -0700

The answers to your question illustrate why IMHO it is bad to have Spark
required for predictions.

Any of the MLlib ALS recommenders use Spark to predict and so run Spark
during the time they are deployed.. They can use one machine or use the
entire cluster. This is one case where using the cluster slows down
predictions since part of the model may be spread across nodes. Spark is
not designed to scale in this manner for real-time queries but I believe
those are your options out of the box for the ALS recommenders.

To be both fast and scalable you would load the model entirely into memory
on one machine for fast queries then spread queries across many identical
machines to scale load. I don’t think any templates do this—it requires a
load balancer at very least, not to mention custom deployment code that
interferes with using the same machines for training.

The UR loads the model into Elasticsearch for serving independently
scalable queries.

I always advise you keep Spark out of serving for the reasons mentioned
above.

From: Ulavapalle Meghamala <ulavapalle.megham...@myntra.com>
<ulavapalle.megham...@myntra.com>
Date: August 7, 2018 at 9:27:46 AM
To: Pat Ferrel <p...@occamsmachete.com> <p...@occamsmachete.com>
Cc: user@predictionio.apache.org <user@predictionio.apache.org>
<user@predictionio.apache.org>, actionml-user
<actionml-u...@googlegroups.com> <actionml-u...@googlegroups.com>
Subject:  Re: PredictionIO spark deployment in Production

Thanks Pat for getting back.

Are there any PredictionIO models/templates which really use Spark in "pio
deploy" ? (not just loading the Spark Context for loading the 'pio deploy'
driver and then dropping the Spark Context), but a running Spark Context
through out the Prediction Server life cycle ? Or How does Prediction IO
handle this case ? Does it create a new Spark Context every time a
prediction has to be done ?

Also, in the production deployments(where Spark is not really used), how do
you scale Prediction Server ? Do you just deploy same model on multiple
machines and have a LB/HA Proxy to handle requests?

Thanks,
Megha

On Tue, Aug 7, 2018 at 9:35 PM, Pat Ferrel <p...@occamsmachete.com> wrote:

> PIO is designed to use Spark in train and deploy. But the Universal
> Recommender removes the need for Spark to make predictions. This IMO is a
> key to use Spark well—remove it from serving results. PIO creates a Spark
> context to launch the `pio deploy' driver but Spark is never used and the
> context is dropped.
>
> The UR also does not need to be re-deployed after each train. It hot swaps
> the new model into use outside of Spark and so if you never shut down the
>  PredictionServer you never need to re-deploy.
>
> The confusion comes from reading Apache PIO docs which may not do things
> this way—don’t read them. Each template defines it’s own requirements. To
> use the UR stick with it’s documentation.
>
> That means Spark is used to “train” only and you never re-deploy. Deploy
> once—train periodically.
>
>
> From: Ulavapalle Meghamala <ulavapalle.megham...@myntra.com>
> <ulavapalle.megham...@myntra.com>
> Reply: user@predictionio.apache.org <user@predictionio.apache.org>
> <user@predictionio.apache.org>
> Date: August 7, 2018 at 4:13:39 AM
> To: user@predictionio.apache.org <user@predictionio.apache.org>
> <user@predictionio.apache.org>
> Subject:  PredictionIO spark deployment in Production
>
> Hi,
>
> Are there any templates in PredictionIO where "spark" is used even in "pio
> deploy" ? How are you handling such cases ? Will you create a spark context
> every time you run a prediction ?
>
> I have gone through then documentation here: http://actionml.com/docs/
> single_driver_machine. But, it only talks about "pio train". Please guide
> me to any documentation that is available on the "pio deploy" with spark ?
>
> Thanks,
> Megha
>
>

Re: PredictionIO spark deployment in Production

Reply via email to