I'm bringing this thread back to life!

There is another thread here this week:
*How to training and deploy on different machine?*

In it, Pat replies:

You will have to spread the pio “workflow” out over a permanent
> deploy+eventserver machine. I usually call this a combo PredictionServer
> and EventServe. These are 2 JVM processes the take events and respond to
> queries and so must be available all the time. You will run `pio
> eventserver` and `pio deploy` on this machine.
>

This is exactly what I'm talking about. Two processes on a single machine
to run a complete deployment. Doesn't it make sense to allow these APIs to
coexist in a single JVM?

Sure, in some cases you may want to scale out and tune two different JVMs
for these two different use-cases, but for most of us, making it so the
main runtime only requires a single process/JVM would make PredictionIO
much more friendly to operate.

A few more comments inline below…


On Wed, Jul 12, 2017 at 7:43 PM, Kenneth Chan <kenn...@apache.org> wrote:

> Mars, i totally understand and agree we should make developer successful.
> but Would like to understand your problem more before jump into conclusion
>
> first, a complete PIO setup has following:
> 1. PIO framework layer
> 2. PIO administration (e.g. PIO app)
> 3. PIO event server
> 4. one or more PIO engines
>
> the storage and setup config applied to 1 globally and the rest 2, 3, 4
> would run on top of 1.
>
> my understanding is that the Buildpack would take engine code and then
> build, release and deploy it which can then serve query.
>
> when heroku user  use buildpack,
> - Where is the event server in the picture?
>

The eventserver is considered optional. If a Heroku user wants to use
events API, then they must provision a second Heroku app for the
eventserver:

https://github.com/heroku/predictionio-buildpack/blob/master/CUSTOM.md#user-content-eventserver


> - How user setup the storage config for 1?
>

With the Heroku buildpack, PostgreSQL is the default for all storage
sources, and it is automatically configured.


> - if i use build pack to deploy another engine, does it share 1 and 2
> above?
>

No. Every engine is another Heroku app. Every eventserver is another Heroku
app. These can be configured to intentionally share databases/storage, such
as for a specific engine+eventserver pair.



> On Wed, Jul 12, 2017 at 3:21 PM, Mars Hall <m...@heroku.com> wrote:
>
>> The key motivation behind this idea/request is to:
>>
>>     Simplify baseline PredictionIO deployment, both conceptually &
>> technically.
>>
>> My vision with this thread is to:
>>
>>     Enable single-process, single network-listener PredictionIO app
>> deployment
>>     (i.e. Queries & Events APIs in the same process.)
>>
>>
>> Attempting to address some previous questions & statements…
>>
>>
>> From Pat Ferrel on Tue, 11 Jul 2017 10:53:48 -0700 (PDT):
>> > how much of your problem is workflow vs installation vs bundling of
>> APIs? Can you explain it more?
>>
>> I am focused on deploying PredictionIO on Heroku via this buildpack:
>>   https://github.com/heroku/predictionio-buildpack
>>
>> Heroku is an app-centric platform, where each app gets a single routable
>> network port. By default apps get a URL like:
>>   https://tdx-classi.herokuapp.com (an example PIO Classification engine)
>>
>> Deploying a separate Eventserver app that must be configured to share
>> storage config & backends leads to all kinds of complexity, especially when
>> unsuspectingly a developer might want to deploy a new engine with a
>> different storage config but not realize that Eventserver is not simply
>> shareable. Despite a lot of docs & discussion suggesting its share-ability,
>> there is precious little documentation that presents how the multi-backend
>> Storage really works in PIO. (I didn't understand it until I read a bunch
>> of Storage source code.)
>>
>>
>> From Kenneth Chan on Tue, 11 Jul 2017 12:49:58 -0700 (PDT):
>> > For example, one can modify the classification to train a classifier on
>> the same set of data used by recommendation.
>> …and later on Wed, 12 Jul 2017 13:44:01 -0700:
>> > My concern of embedding event server in engine is
>> > - what problem are we solving by providing an illusion that events are
>> only limited for one engine?
>>
>> This is a great ideal target, but the reality is that it takes some
>> significant design & engineering to reach that level of data share-ability.
>> I'm not suggesting that we do anything to undercut the possibilities of
>> such a distributed architecture. I suggest that we streamline PIO for
>> everyone that is not at that level of distributed architecture. Make PIO
>> not *require* it.
>>
>> The best example I have is that you can run Spark in local mode, without
>> worrying about any aspect of its ideal distributed purpose. (In fact
>> PredictionIO is built on this feature of Spark!) I don't know the history
>> there, but would imagine Spark was not always so friendly for small or
>> embedded tasks like this.
>>
>>
>> A huge part of my reality is seeing how many newcomers fumble around and
>> get frustrated. I'm looking at PredictionIO from a very Heroku-style
>> perspective of "how do we help [new] developers be successful", which is
>> probably going to seem like I want to take away capabilities. I just want
>> to make the onramp more graceful!
>>
>> *Mars
>>
>> ( <> .. <> )
>
>
>


-- 
*Mars Hall
415-818-7039 <(415)%20818-7039>
Customer Facing Architect
Salesforce Platform / Heroku
San Francisco, California

Reply via email to