I'm bringing this thread back to life! There is another thread here this week: *How to training and deploy on different machine?*
In it, Pat replies: You will have to spread the pio “workflow” out over a permanent > deploy+eventserver machine. I usually call this a combo PredictionServer > and EventServe. These are 2 JVM processes the take events and respond to > queries and so must be available all the time. You will run `pio > eventserver` and `pio deploy` on this machine. > This is exactly what I'm talking about. Two processes on a single machine to run a complete deployment. Doesn't it make sense to allow these APIs to coexist in a single JVM? Sure, in some cases you may want to scale out and tune two different JVMs for these two different use-cases, but for most of us, making it so the main runtime only requires a single process/JVM would make PredictionIO much more friendly to operate. A few more comments inline below… On Wed, Jul 12, 2017 at 7:43 PM, Kenneth Chan <kenn...@apache.org> wrote: > Mars, i totally understand and agree we should make developer successful. > but Would like to understand your problem more before jump into conclusion > > first, a complete PIO setup has following: > 1. PIO framework layer > 2. PIO administration (e.g. PIO app) > 3. PIO event server > 4. one or more PIO engines > > the storage and setup config applied to 1 globally and the rest 2, 3, 4 > would run on top of 1. > > my understanding is that the Buildpack would take engine code and then > build, release and deploy it which can then serve query. > > when heroku user use buildpack, > - Where is the event server in the picture? > The eventserver is considered optional. If a Heroku user wants to use events API, then they must provision a second Heroku app for the eventserver: https://github.com/heroku/predictionio-buildpack/blob/master/CUSTOM.md#user-content-eventserver > - How user setup the storage config for 1? > With the Heroku buildpack, PostgreSQL is the default for all storage sources, and it is automatically configured. > - if i use build pack to deploy another engine, does it share 1 and 2 > above? > No. Every engine is another Heroku app. Every eventserver is another Heroku app. These can be configured to intentionally share databases/storage, such as for a specific engine+eventserver pair. > On Wed, Jul 12, 2017 at 3:21 PM, Mars Hall <m...@heroku.com> wrote: > >> The key motivation behind this idea/request is to: >> >> Simplify baseline PredictionIO deployment, both conceptually & >> technically. >> >> My vision with this thread is to: >> >> Enable single-process, single network-listener PredictionIO app >> deployment >> (i.e. Queries & Events APIs in the same process.) >> >> >> Attempting to address some previous questions & statements… >> >> >> From Pat Ferrel on Tue, 11 Jul 2017 10:53:48 -0700 (PDT): >> > how much of your problem is workflow vs installation vs bundling of >> APIs? Can you explain it more? >> >> I am focused on deploying PredictionIO on Heroku via this buildpack: >> https://github.com/heroku/predictionio-buildpack >> >> Heroku is an app-centric platform, where each app gets a single routable >> network port. By default apps get a URL like: >> https://tdx-classi.herokuapp.com (an example PIO Classification engine) >> >> Deploying a separate Eventserver app that must be configured to share >> storage config & backends leads to all kinds of complexity, especially when >> unsuspectingly a developer might want to deploy a new engine with a >> different storage config but not realize that Eventserver is not simply >> shareable. Despite a lot of docs & discussion suggesting its share-ability, >> there is precious little documentation that presents how the multi-backend >> Storage really works in PIO. (I didn't understand it until I read a bunch >> of Storage source code.) >> >> >> From Kenneth Chan on Tue, 11 Jul 2017 12:49:58 -0700 (PDT): >> > For example, one can modify the classification to train a classifier on >> the same set of data used by recommendation. >> …and later on Wed, 12 Jul 2017 13:44:01 -0700: >> > My concern of embedding event server in engine is >> > - what problem are we solving by providing an illusion that events are >> only limited for one engine? >> >> This is a great ideal target, but the reality is that it takes some >> significant design & engineering to reach that level of data share-ability. >> I'm not suggesting that we do anything to undercut the possibilities of >> such a distributed architecture. I suggest that we streamline PIO for >> everyone that is not at that level of distributed architecture. Make PIO >> not *require* it. >> >> The best example I have is that you can run Spark in local mode, without >> worrying about any aspect of its ideal distributed purpose. (In fact >> PredictionIO is built on this feature of Spark!) I don't know the history >> there, but would imagine Spark was not always so friendly for small or >> embedded tasks like this. >> >> >> A huge part of my reality is seeing how many newcomers fumble around and >> get frustrated. I'm looking at PredictionIO from a very Heroku-style >> perspective of "how do we help [new] developers be successful", which is >> probably going to seem like I want to take away capabilities. I just want >> to make the onramp more graceful! >> >> *Mars >> >> ( <> .. <> ) > > > -- *Mars Hall 415-818-7039 <(415)%20818-7039> Customer Facing Architect Salesforce Platform / Heroku San Francisco, California