The key motivation behind this idea/request is to:

    Simplify baseline PredictionIO deployment, both conceptually & technically.

My vision with this thread is to:

    Enable single-process, single network-listener PredictionIO app deployment
    (i.e. Queries & Events APIs in the same process.)


Attempting to address some previous questions & statements…


From Pat Ferrel on Tue, 11 Jul 2017 10:53:48 -0700 (PDT):
> how much of your problem is workflow vs installation vs bundling of APIs? Can 
> you explain it more?

I am focused on deploying PredictionIO on Heroku via this buildpack:
  https://github.com/heroku/predictionio-buildpack

Heroku is an app-centric platform, where each app gets a single routable 
network port. By default apps get a URL like:
  https://tdx-classi.herokuapp.com (an example PIO Classification engine)

Deploying a separate Eventserver app that must be configured to share storage 
config & backends leads to all kinds of complexity, especially when 
unsuspectingly a developer might want to deploy a new engine with a different 
storage config but not realize that Eventserver is not simply shareable. 
Despite a lot of docs & discussion suggesting its share-ability, there is 
precious little documentation that presents how the multi-backend Storage 
really works in PIO. (I didn't understand it until I read a bunch of Storage 
source code.)


From Kenneth Chan on Tue, 11 Jul 2017 12:49:58 -0700 (PDT):
> For example, one can modify the classification to train a classifier on the 
> same set of data used by recommendation.
…and later on Wed, 12 Jul 2017 13:44:01 -0700:
> My concern of embedding event server in engine is 
> - what problem are we solving by providing an illusion that events are only 
> limited for one engine?

This is a great ideal target, but the reality is that it takes some significant 
design & engineering to reach that level of data share-ability. I'm not 
suggesting that we do anything to undercut the possibilities of such a 
distributed architecture. I suggest that we streamline PIO for everyone that is 
not at that level of distributed architecture. Make PIO not *require* it.

The best example I have is that you can run Spark in local mode, without 
worrying about any aspect of its ideal distributed purpose. (In fact 
PredictionIO is built on this feature of Spark!) I don't know the history 
there, but would imagine Spark was not always so friendly for small or embedded 
tasks like this.


A huge part of my reality is seeing how many newcomers fumble around and get 
frustrated. I'm looking at PredictionIO from a very Heroku-style perspective of 
"how do we help [new] developers be successful", which is probably going to 
seem like I want to take away capabilities. I just want to make the onramp more 
graceful!

*Mars

( <> .. <> )

Reply via email to