Re: Eventserver API in an Engine?

Kenneth Chan Wed, 12 Jul 2017 13:44:23 -0700

i don't think it's turn key or not. i think it's about if PIO is for single
engine vertical only or multi-engine sharing data.
For example, UR accept multiple events like "eventNames": ["buy", "view"] . One
can create another classification engine to use same set of events.


Understand there is difference between complexity of template, but we can't
say PIO can't run multiple engine sharing data because of existing template
doesn't work together because the template is meant to show how to do
things differently -  re:  "this is demonstrably untrue. Try it. Clustering
for some templates assumes textual data, others do not. This seems so far
from my experience that your statement is baffling. The PIO event stream
from one recommender to the next is not compatible either. The E-Com engine
requires $set events on items, the UR does not. So taking UR events into
the E-Com recommender would result in garbage output."

My concern of embedding event server in engine is
- what problem are we solving by providing an illusion that events are only
limited for one engine?





On Wed, Jul 12, 2017 at 12:11 PM, Pat Ferrel <p...@occamsmachete.com> wrote:

> "I think to resolve Mars immediate need, we can implement embedded event
> server in a couple phases. Roughly it would be wiring the existing event
> server in (with some refactoring) and mark it experimental, then continue
> toward a clean, app-specific event server.”
>
> This sounds reasonable but Donald, you may have a better in-house
> understanding of Mars’s requirements. I would love to see a simple list.
>
> I also like the parallel “experimental” track which allows us to do
> refactoring with little disruption to existing users. This is the way
> Mahout went from a loose collection of Hadoop Mapreduce algorithms to a
> completely new codebase that is a platform neutral (but primarily Spark
> based) general optimized massively scalable linear algebra solving engine.
>
> I'll offer to donate what we are calling Harness (previously pio-kappa).
> It is prototype code but implements most all of AMLs design goals as
> mentioned in this and other threads. It’s implementation is fully
> functional for a single Kappa-style engine and so Lambda support is only
> stubbed out. Integration of the PIO EventStore is not done—data storage is
> not abstracted yet. It implements app centric Templates but in a fully
> multi-tenant secure manner. The most solid part is the microservice based
> mutli-tenant rest-server with accompanying Python CLI along with Java and
> Python SDKs. Not sure if it applies to the short term needs Mars has.
>
> If you read these docs do not come with PIO workflow preconceptions
> https://github.com/actionml/harness
>
>
> On Jul 12, 2017, at 9:53 AM, Donald Szeto <don...@apache.org> wrote:
>
> Many good discussions. Let me provide my input on these issues.
>
> Multiple installations of PredictionIO should use different database
> names. An analogy would be Wordpress installations that expect its own
> metadata database. I understand the downside to this is that some users
> only have access to one database. We can add database table prefixing
> support to alleviate this like most other projects do. I agree it is not
> very clear in the documentation that installations of PIO should not be
> backed by overlapping data stores.
>
> Regarding the discussion of data and engine, here's what it seems to me:
> two directions of data science development.
>
> One perspective is that data collection and processing is independent from
> data science development. Data are collected and organized ("apps" in PIO
> term). Developers go look at what's available, explore, and develop
> (engines).
>
> The other one is to provide turnkey solutions. Well crafted engines expect
> certain inputs and expose knobs for tuning.
>
> PIO supports both styles today. Apps provide the grouping of data, and
> engine is the abstraction to define the concern of data. These are well
> defined from day 1.
>
> Side track: a confusion I feel here is that templates have different
> degree of sophistication. The universal recommender is definitely much more
> sophisticated and turnkey than the skeleton template for example. We should
> label this in our template gallery.
>
> Going back to Mars suggestion. If the use case is such that the engine
> server also collects data used by only the engine, it feels like the right
> abstraction would be embedding a subset of event server that collects data
> going to a single app. Recall that app name is configured in engine.json.
>
> I think to resolve Mars immediate need, we can implement embedded event
> server in a couple phases. Roughly it would be wiring the existing event
> server in (with some refactoring) and mark it experimental, then continue
> toward a clean, app-specific event server.
>
> Let me know how these sound.
>
> On Tue, Jul 11, 2017 at 1:39 PM Kenneth Chan <kenn...@apache.org> wrote:
>
>> re:
>> "
>> when deploying multiple engines with different versions of PIO and
>> different storage configurations ....
>>
>> needing separate PIO installs regularly when testing the next release or
>> development builds of PIO and when evaluating engine templates or
>> algorithms that require new, different storage configs. Also, those in the
>> consulting world are frequently required to keep client data separated for
>> all kinds of privacy & legal reasons; with the storage corruption bug I
>> reported, one client's data could become visible to or intermingled with
>> another client's app.
>> "
>>
>> when install multiple PIO separately, could you set the each PIO DataBase
>> config to use different table name so they don't conflict?
>> or bring up another VM to isolate PIO?
>>
>> Donald, do you have best practice or advice if user want to install
>> multiple PIO versions and able to run them in the same machine?
>>
>>
>>
>> On Tue, Jul 11, 2017 at 12:49 PM, Kenneth Chan <kenn...@apache.org>
>> wrote:
>>
>>> I think we are having wrong impression that every template are supposed
>>> to work together out of the box.
>>>
>>> The templates are meant to be examples and demonstration - that's why
>>> they are called template! they are never meant to be fit into any user
>>> application right away. Each application has its uniqueness. The template
>>> only assume a specific use case for demonstration purpose.
>>>
>>> User can start with template for simple case but they need to modify for
>>> their final needs.
>>>
>>> For example, the PIO classification template is only meant for
>>> demonstrating simple classification. At the end, how to use classification
>>> is application specific. For example, one can modify the classification to
>>> train a classifier on the same set of data used by recommendation.
>>>
>>>
>>>
>>>
>>> On Tue, Jul 11, 2017 at 10:31 AM, Pat Ferrel <p...@occamsmachete.com>
>>> wrote:
>>>
>>>> Understood, you have immediate practical reasons for 1 integrated
>>>> deployment with the 2 endpoints. But Apache is a do-ology, meaning those
>>>> who do something win the argument as long as they have enough consensus. I
>>>> have enough experience with PIO that I have chosen to fix a lot of issues
>>>> with the prototype design, having already gone down the “quick hack” path
>>>> once. You may want to do something else if you have the resources.
>>>>
>>>> I fear that my deeper changes will not get enough consensus and we may
>>>> end up with a competing ML/AI server framework some day. That is another
>>>> ASF tendency. Innovations happen before going into ASF, often not under ASF
>>>> rules.
>>>>
>>>> In any case—how much of your problem is workflow vs installation vs
>>>> bundling of APIs? Can you explain it more?
>>>>
>>>>
>>>> On Jul 11, 2017, at 9:37 AM, Mars Hall <m...@heroku.com> wrote:
>>>>
>>>> > On Jul 10, 2017, at 18:03, Kenneth Chan <kenn...@apache.org> wrote:
>>>> >
>>>> > it's all same set of events collected for my application and i can
>>>> create multiple engine to use these data for different purpose.
>>>>
>>>>
>>>> Clear to me, ⬆️ this is the prevailing reasoning behind the
>>>> "separateness" of the Eventserver. I do not foresake this design goal, but
>>>> ask that we consider the usability & durability of PredictionIO when
>>>> deploying multiple engines with different versions of PIO and different
>>>> storage configurations. This will probably happen for anyone who uses
>>>> PredictionIO long-term in production, as their new projects come on-line
>>>> with newer & better versions & configurations.
>>>>
>>>> I encounter this situation of needing separate PIO installs regularly
>>>> when testing the next release or development builds of PIO and when
>>>> evaluating engine templates or algorithms that require new, different
>>>> storage configs. Also, those in the consulting world are frequently
>>>> required to keep client data separated for all kinds of privacy & legal
>>>> reasons; with the storage corruption bug I reported, one client's data
>>>> could become visible to or intermingled with another client's app.
>>>>
>>>> In starting this thread, I was hoping to find some traction with the
>>>> idea of making it possible to completely self-contain a PredictionIO app by
>>>> adding the Events API to the process started with `pio deploy`.
>>>>
>>>> Goal: Queries & Events APIs in the same process.
>>>>
>>>> When considering the architecture of apps, sharing a database between
>>>> two or more apps is considered a very naughty way to get around having
>>>> clear, clean, inter-process API's. My team at Salesforce/Heroku has been
>>>> struck by this exact issue with PredictionIO. So, I am seeking a way to fix
>>>> this without requiring a rewrite of PredictionIO. I am excited to hear
>>>> about the new architecture prototypes, yet our reality is that this is an
>>>> issue now.
>>>>
>>>> *Mars
>>>>
>>>> ( <> .. <> )
>>>>
>>>>
>>>>
>>>
>>
>

Re: Eventserver API in an Engine?

Reply via email to