Hi Pat, Thank you so much for giving me such a clear idea, it really did help me alot. This is the very first time I'm touching the big data, I hope It wouldn't be that bad.
I would set it up as you recommended, and will come to ask you if something I need to know, which will be very often. Thank You Vaghawan On Fri, Mar 24, 2017 at 3:23 AM, Pat Ferrel <[email protected]> wrote: > Think of the recommender as a single app. It is scalable to whatever your > data size via the services it is built on. We often see that using a > recommender is people’s first experience with really big data. Other tools > and services you use outside of it are fine because they do not deal with > such large data. Recommenders force you so process every interaction that > all your users have made over perhaps a year and do it often. There are few > other apps that require this. Welcome to Big-Data. > > MySQL is fine to run your app as you no doubt know. The “model” built in a > recommender is generally not human readable but in the case of the UR you > can understand it with some experience. It lives in Elasticsearch while the > user interactions live in HBase. The user events can be looked at but not > sure why you’d want too, they are condensed snippets of server logs. > > In any case it may help to think of the model in Elasticsearch as a > product catalog. It will define what items can be recommended and have an > entry for each item with Machine Learning calculated attributes attached > that indicate the type of user that prefers each item. But the model also > contains item properties/attributes that you may want to include for > business rules. > > The Recommender is easily accessed from you app through the input and > query API. You can change attributes of items by sending special input > events. Queries are defined that match the type of things recommenders with > business rules do and the model can be seen through Elasticsearch APIs but > it is discouraged to do any direct manipulation of these since their > meaning or format may change with any update. > > Plan to use the PIO query API, it will respond in real-time, with latency > on the order of 25ms, and multiple simultaneous connections/queries. There > would be no reason to pull out data from the UR and put it in a database or > you would loose the ability to react to user’s real-time behavior, which is > used to make recommendations. Stick to the input/query APIs and feed data > into the UR in real-time and you’ll get the most benefit. > > > On Mar 23, 2017, at 12:25 PM, Vaghawan Ojha <[email protected]> wrote: > > Hi Pat, > > Thank you very much.Yes I will be following actionml instruction since I'm > going to use UR. I think I should rather direct myself to HBASE rather than > expensing time in setting up Mysql. Part of my need is that once we train > the dataset, the result should be easily available to the application which > are running into Mysql. > > I'm fairly new to the concept itself. So basically I would always have a > larage json file coming from the application which uses mysql(this > shouldn't be the problem). Then I would use PIO and UR to do the hard work, > and get back the result either like an API which I think already works in > PIO or saved somewhere in database like mysql or something like that. > > Thanks > > On Fri, Mar 24, 2017 at 1:03 AM, Pat Ferrel <[email protected]> wrote: > >> The UR uses Elasticsearch for part of the Recommender algorithm, therefor >> it must be configured as a storage backend. It is possible to use Postgres >> or MySQL for the other stores but we have very little experience with this. >> HBase is indefinitely scalable so we always use that. Single machine >> deployments are rare with a reasonably sized data so Elasticsearch + Hbase >> running separately or in clusters will always meet the data needs. The RDBs >> will not and anyway, like I said you have to use Elasticsearch. >> >> Therefore for the UR follow instructions on the ActionML site since they >> are specific to the UR. For other templates you may use other >> configurations of PIO but if you use the UR config you can also use every >> template too. >> >> >> >> On Mar 23, 2017, at 9:07 AM, Vaghawan Ojha <[email protected]> wrote: >> >> Hi, Thank you! >> >> I came into further more confusion here, actually I installed prediction >> IO version 0.10.0 from here http://predictionio.incub >> ator.apache.org/install/install-sourcecode/ and have been fighting to >> configure mysql as a storage in my local linux machine. >> >> But I see there is a different documentation of installing in actionml >> website, I'm not sure for which I would have to go. Currently there is no " >> pio-env.sh". file inside conf folder however there is >> pio-env.sh.template file. I commented the pgsql section and uncommented the >> mysql section with the username and password, but whenever I do . sudo >> PredictionIO-0.10.0-incubating/bin/pio eventserver there seems to be an >> error that says that authentication failed with pgsql, however I don't want >> to use pgsql. >> >> # Storage Repositories >> >> # Default is to use PostgreSQL >> PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta >> PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=PGSQL >> >> PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event >> PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=PGSQL >> >> PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model >> PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=PGSQL >> >> # Storage Data Sources >> >> # PostgreSQL Default Settings >> # Please change "pio" to your database name in >> PIO_STORAGE_SOURCES_PGSQL_URL >> # Please change PIO_STORAGE_SOURCES_PGSQL_USERNAME and >> # PIO_STORAGE_SOURCES_PGSQL_PASSWORD accordingly >> #PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc >> #PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio >> #PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio >> #PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio >> >> # MySQL Example >> PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc >> PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://localhost/pio >> PIO_STORAGE_SOURCES_MYSQL_USERNAME=root >> PIO_STORAGE_SOURCES_MYSQL_PASSWORD=root >> >> >> This is how the pio-env.sh.template looks like. And again when I visited >> the actionml site, it suggests that I do have to have ELASTICSEARCH. but >> prediction.io site doesn't tells us the same. Which one should I follow >> and where would I find the current working version of installation guide. I >> actually wanaa use prediction.io in my production shortly after I >> implemented in local. >> >> Please help me, thank you very much for your help, I appreciate it so >> much. >> Vaghawan >> >> >> On Thu, Mar 23, 2017 at 9:27 PM, Pat Ferrel <[email protected]> >> wrote: >> >>> Since PIO has moved to Apache, the namespace of PIO code changed and so >>> all templates need to be updated. None of the ones in >>> https://github.com/PredictionIO/ >>> <https://github.com/PredictionIO/template-scala-parallel-universal-recommendation> >>> will >>> work with Apache PIO. For the upgraded UR see: https://github.com/action >>> ml/universal-recommender Docs for the UR are here: >>> http://actionml.com/docs/ur >>> >>> Also look on the Template gallery page here for a description of >>> template status. Some have not been moved to the new namespace and >>> converted to run with PIO but this is pretty easy to do yourself. >>> http://predictionio.incubator.apache.org/gallery/template-gallery/ >>> >>> user_id, product_id and purchase_date is all you need to use any >>> recommender. If you plan to gather other events in the future, use the UR. >>> As far as item or user based recommendations, the UR will give either based >>> on the query with the same data and model, as some others will do. The UR >>> allows you to mix both types in a single query, which may be useful with >>> small amounts of individual user data. >>> >>> Also the accepted wisdom about this it to put item-based recs on item >>> detail pages, and user-based recs elsewhere, when you don’t have an item to >>> base recs on, or in another placement on any page. >>> >>> You can have many different placements of recs in any page by changing >>> the queries. This is how Netflix gets rows and rows of specialized recs for >>> different things all based on the same data. The UR queries are quite >>> flexible. >>> >>> >>> On Mar 23, 2017, at 7:08 AM, Vaghawan Ojha <[email protected]> >>> wrote: >>> >>> Hi, >>> >>> I've been trying to deploy a recommendation system using >>> https://github.com/PredictionIO/template-scala-paralle >>> l-universal-recommendation. >>> >>> I've purchase history of user something like this: >>> user_id, product_id and purchase_date, so I will be using user_id and >>> product_id to determine the recommendation. I'm not sure if I would be able >>> to customize the default even parameter. >>> >>> Do you have any suggestions like which template would be more suitable >>> for my problem. I don't have data like rating or view state, I only have >>> data about user and product they purchased. I need something like item >>> based similarity as well as user based item similarity. >>> >>> Any help would be great >>> >>> Thank you >>> Vaghawan >>> >>> >> >> > >
