Yes, this is the recommended config (Postgres is not, but later). Spark is only 
needed during training but the `pio train` process creates drives and executors 
in Spark. The driver will be the `pio train` machine so you must install pio on 
it. You should have 2 Spark machines at least because the driver and executor 
need roughly the same memory, more executors will train faster.

You will have to spread the pio “workflow” out over a permanent 
deploy+eventserver machine. I usually call this a combo PredictionServer and 
EventServe. These are 2 JVM processes the take events and respond to queries 
and so must be available all the time. You will run `pio eventserver` and `pio 
deploy` on this machine. the Spark driver machine will run `pio train`. Since 
no state is stored in PIO this will work because the machines get state from 
the DBs (HBase is recommended, and Elasticsearch). Install pio and the UR in 
the same location on all machines because the path to the UR is used by PIO to 
give an id to the engine (not ideal, but oh well). 

Once setup:
Run `pio eventserver` on the permanent PS/ES machine and input your data into 
the EventServer.
Run `pio build` on the “driver” machine and `pio train` on the same machine. 
This build the UR, puts metadata about the instance in PIO and creates the 
Spark driver, which can use a separate machine or 3 as Spark executors.
Then copy the UR directory to the PS/ES machine and do `pio deploy` from the 
copied directory.
Shut down the driver machine and Spark executors. For AWS “stopping" them means 
config is saved so you only pay for EBS storage. You will start them before the 
next train.

From then on there is no need to copy the UR directory, just spin up the driver 
and any other Spark machine, do `pio train` and you are done. The model is 
automatically hot-swapped with the old one with no downtime and no need to 
re-deploy.

This will only work in this order if you want to take advantage of a temporary 
Spark. PIO is installed on the PS/ES machine and the “driver” machine in 
exactly the same way connecting to the same stores.

Hmm, I should write a How to for this...



On Sep 20, 2017, at 3:23 AM, Brian Chiu <br...@snaptee.co> wrote:

Hi,

I would like to be able to train and run model on different machines.
The reason is, on my dataset, training takes around 16GB of memory and
deploying only needs 8GB.  In order to save money, it would be better
if only a 8GB memory machine is used in production, and only start a
16GB one perhaps weekly for training.  Is it possible with
predictionIO + universal recommender?

I have done some search and found a related guide here:
https://github.com/actionml/docs.actionml.com/blob/master/pio_load_balancing.md
Which copy the whole template directory and then run pio deploy.  But
in their case HBase and elasticsearch cluster are used.  In my case
only a single machine is used with elasticsearch and postgresql.  Will
this work?  (I am flexible about using postresql or localfs or hbase,
but I cannot afford a cluster)

Perhaps another solution to make the 16GB machine as a spark slave,
start it before training start, and the 8GB machine will connect to
it. Then call pio train; pio deploy on the 8GB machine.  Finally
shutdown the 16GB machine.  But I have no idea if it can work.  And if
yes, is there any documentation I can look into?

Any other method is welcome!  Zero downtime is preferred but not necessary.

Thanks in advance.


Best Regards,
Brian

Reply via email to