[jira] [Commented] (MAHOUT-1882) SequentialAccessSparseVector inerateNonZeros is incorrect.

2017-01-09 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812561#comment-15812561 ] Pat Ferrel commented on MAHOUT-1882: Can't see that I use this, at least not obviously unless

Re: Misconfigured date information, either your engine.json date settings or your query's dateRange is incorrect.

2017-01-09 Thread Pat Ferrel
on or the other. Below you are setting up a date attached to items and have called it “releaseDate" in one place but set it using $set events to be “release_date”, different strings. This is probably your error, you didn’t send the error. On Jan 9, 2017, at 8:56 AM, Pat Ferr

Re: Misconfigured date information, either your engine.json date settings or your query's dateRange is incorrect.

2017-01-09 Thread Pat Ferrel
no, that is talking about using a date range in the query. Answer this first: Do you want a fixed date attached to itmes with the rang in the query or an expired/available attached to items and the corrent date that must fall between them? Pick on or the other. Below you are setting up a

Re: PredictionIO :: [error] (*:update) sbt.ResolveException: unresolved dependency: io.prediction#pio-build;0.9.0: not found

2017-01-06 Thread Pat Ferrel
io services and started again, and while doing pio build, getting following error, Is it possible to receive a support call, to demo the code walk through to identify the subject clearly. Regards, Yasho. From: Pat Ferrel [mailto:p...@occamsmachete.com] Sent: 06 January 2017 01:43 To: user@p

Re: PredictionIO :: [error] (*:update) sbt.ResolveException: unresolved dependency: io.prediction#pio-build;0.9.0: not found

2017-01-05 Thread Pat Ferrel
the plugin is in project/pio-build.sbt. Remove it and once you can build it should be looking for the version in your build.sbt On Jan 5, 2017, at 10:56 AM, Pat Ferrel <p...@occamsmachete.com> wrote: I may have it. There is an SBT plugin that you are using that should be removed. The

Re: PredictionIO :: [error] (*:update) sbt.ResolveException: unresolved dependency: io.prediction#pio-build;0.9.0: not found

2017-01-05 Thread Pat Ferrel
ile2 path: vsadmin@predictionio:~/SPIO/apache-predictionio-0.10.0-incubating$ sudo nano build.sbt Regards, Yasho. From: Pat Ferrel [mailto:p...@occamsmachete.com] Sent: 04 January 2017 22:00 To: user@predictionio.incubator.apache.org Cc: Sankar M <san...@vishwak.com>; Sheik Dawood Jainull

Re: PIO Stops working suddenly after few days

2017-01-04 Thread Pat Ferrel
I’d guess that #2 may be caused by #1 since PIO is unable to lookup metadata without ES or they are both caused by some resource being used up. What template are you using? If ES is complaining I’d guess it’s the Universal Recommender? Is so, for some reason the connection to ES at query time

Re: PredictionIO :: [error] (*:update) sbt.ResolveException: unresolved dependency: io.prediction#pio-build;0.9.0: not found

2017-01-03 Thread Pat Ferrel
Please send the text of the error you are seeing, screen shots are very hard to read. What does `pio version` report? On Jan 3, 2017, at 2:26 AM, Yasothai Rasappan wrote: Hi Rasna, Is it possible to have a call to discuss this issue? Regards, Yasho. From: Yasothai

State of PIO

2017-01-02 Thread Pat Ferrel
The activity level has been pretty low lately. I hope this is just timing of holidays, work, etc. I know I’ve been too busy at work to do much until now. If we are going to graduate from the incubator we’ll have to start regular releases with substantial progress in each. We have several PRs

Re: Decrease score of user viewed item instead of excluding

2016-12-25 Thread Pat Ferrel
No. I think this might make sense but my first question is are you using multiple indicators? If you have a lot of users that only get recommendations they already know about, you are not getting a lot of value from the recommender—adding more data might help. If you’ve done the above,

Re: beginner to ML

2016-12-25 Thread Pat Ferrel
As a beginner you have chosen a multi-layer problem, each of which would be a good beginner problem. I’ve found it useful to think of ML in 3 broad categories, classifiers, recommenders, and regressions. Event NLP is a classifier, taking some words and classifying each. As Gustav mentioned

Search on docs not working

2016-12-25 Thread Pat Ferrel
Not sure this is worth a Jira but search isn’t working on the PIO site.

Re: Data clean up

2016-12-22 Thread Pat Ferrel
t; Thanks for your time and help, much appreciated! Bruno 2016-12-18 18:58 GMT+01:00 Pat Ferrel <p...@occamsmachete.com <mailto:p...@occamsmachete.com>>: There was a bug in this feature in the Apache PIO version that has been fixed in the SNAPSHOT. We will do a source tag to fix it before

[jira] [Commented] (MAHOUT-1786) Make classes implements Serializable for Spark 1.5+

2016-12-19 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15761631#comment-15761631 ] Pat Ferrel commented on MAHOUT-1786: It sounds like we could remove Kryo altogether and improve

Re: Data clean up

2016-12-18 Thread Pat Ferrel
There was a bug in this feature in the Apache PIO version that has been fixed in the SNAPSHOT. We will do a source tag to fix it before the next release . The page you reference is being changed now to advise an Apache PIO install as the root source of the project going forward. Keep and eye

Re: A/B test on recommendation engine

2016-12-17 Thread Pat Ferrel
oad-balancer, so I would like to modify the Serving results. I'll try to figure it out, thanks! Best regards, Amy Pat Ferrel <p...@occamsmachete.com <mailto:p...@occamsmachete.com>> 於 2016年12月17日 週六 上午5:59寫道: Amy asked: "is there any way to modify the http response for query? (S

Git branching policy

2016-12-15 Thread Pat Ferrel
I have changes in the master that are needed for some users of Mahout. However the master is often chaotic due to being the branch that is the SNAPSHOT of all partial or not well tested changes. The key feature of the branching model described in the blog is that master is stable and contains

Re: Question about spark-itemsimilarity

2016-12-14 Thread Pat Ferrel
-occurrences - purchase history/clicks or downloads Best, Niklas 2016-12-01 18:47 GMT+01:00 Pat Ferrel <p...@occamsmachete.com>: > No you can’t, the value is ignored. The algorithm looks at occurrences, > cooccurrences, and cross-occurrences of several event types not values > atta

Re: Micro-releases

2016-12-14 Thread Pat Ferrel
10:25 AM Pat Ferrel <p...@occamsmachete.com> wrote: > But the downside is we never touch major, or haven’t in all the years of > PIO. So in effect all releases will be minor number changes and so far we > have only touched that 10 times in all the years of PIO. We could skip many >

Re: Customizing Recommender engine

2016-12-13 Thread Pat Ferrel
The UR has a new Apache PIO compatible repo here: https://github.com/actionml/universal-recommender.git git clone https://github.com/actionml/universal-recommender.git ur and proceed. The UR allows boosts or filters by properties. You are using a filter (bias -1), which does not work with

Re: reduce logging in prediction io

2016-12-13 Thread Pat Ferrel
modify the log4j settings in dist/conf/log4j.properties On Dec 13, 2016, at 2:23 AM, Rasna Tomar wrote: Hi Is there any configuration setting in Prediction IO to reduce logging while pio build, train and deploy. I would like to capture only error logs in pio.log file

Re: Virtual Kickoff Meeting

2016-12-11 Thread Pat Ferrel
about Dec 13 2-4PM or Dec 15 11-1pm or suggest another time On Dec 7, 2016, at 4:11 PM, Pat Ferrel <p...@occamsmachete.com> wrote: Argh, both my computers were down, did we have this? if not how about Friday or next week M or Tu? On Dec 5, 2016, at 9:06 AM, Donald Szeto <don...@apache.o

Re: Stateless Builds

2016-12-11 Thread Pat Ferrel
OK, so that was too much detail. My immediate question is how to train on one machine and deploy on several others—all referencing the same instance data (model)? Before it was by copying the manifest, now there is no manifest. On Dec 7, 2016, at 5:43 PM, Pat Ferrel <p...@occamsmachete.

Re: Stateless Builds

2016-12-07 Thread Pat Ferrel
My first question is how to train on an ephemeral machine to swap models into an already deployed prediction server, because this is what i do all the time. The only way to do this now is train first on dummy data then deploy and re-train as data comes in, but there are other issues and

Re: Virtual Kickoff Meeting

2016-12-07 Thread Pat Ferrel
Argh, both my computers were down, did we have this? if not how about Friday or next week M or Tu? On Dec 5, 2016, at 9:06 AM, Donald Szeto wrote: I can do Tuesday 11-12pm and 3-4pm PST. On Sun, Dec 4, 2016 at 10:34 AM Rafa Haro wrote: > I would join

Re: Tuning of Recommendation Engine

2016-12-07 Thread Pat Ferrel
s Gustavo On Thu, Dec 1, 2016 at 2:48 PM, Pat Ferrel <p...@occamsmachete.com <mailto:p...@occamsmachete.com>> wrote: This is a very odd statement. How many tuning knobs do you have with MLlib’s ALS, 1, 2? There are a large number of tuning knobs for the UR to fit different

Re: Recommending for new products

2016-12-03 Thread Pat Ferrel
PredictionIO is not a recommender. It is a Framework you add templates to. The templates do all the machine learning and can be of many types. That said if you are using the Universal Recommender here: https://github.com/actionml/universal-recommender

Re: DNS Down?

2016-12-02 Thread Pat Ferrel
That domain will never be restored. The project is now an Apache project and so install.sh will hosted on apache.org. Where did you find this link? It should be fixed. On Dec 2, 2016, at 2:20 AM, Munish Malhotra wrote: Hello, Is

Re: Tuning of Recommendation Engine

2016-12-01 Thread Pat Ferrel
anything you want or modify the source to tune anything your intuition says might work. On Dec 1, 2016, at 11:48 AM, Pat Ferrel <p...@occamsmachete.com> wrote: This is a very odd statement. How many tuning knobs do you have with MLlib’s ALS, 1, 2? There are a large number of tuning

Re: Tuning of Recommendation Engine

2016-12-01 Thread Pat Ferrel
according to score in elastic search. I didn't understand why the 2nd and third parameters are taken, also if anyone can explain the correctness of the method, That is why does it work rather how it works? Regards Harsh Mathur On Dec 1, 2016 11:01 PM, "Pat Ferrel" <p.

Virtual Kickoff Meeting

2016-12-01 Thread Pat Ferrel
I’d like to propose a Google Hangout for any interested parties to discuss the PIO roadmap. It may be a very chaotic meeting since it’s our first so please visit this page and add your throughts ahead of time or see what other people are thinking:

Re: Question about spark-itemsimilarity

2016-12-01 Thread Pat Ferrel
No you can’t, the value is ignored. The algorithm looks at occurrences, cooccurrences, and cross-occurrences of several event types not values attached to events. If you are trying to use rating info, this has been pretty much discarded as being not very useful. For instance you may like

Re: Incremental model training in real time event processing

2016-12-01 Thread Pat Ferrel
No, lambda has nothing to do with how much data is used in training. Lambda just say that there is a batch/background process that account for changes in data in non-real-time. It is theoretically possible but not supported yet. However you may be training too often... The Universal

Re: Micro-releases

2016-11-28 Thread Pat Ferrel
on we should update http://predictionio. > incubator.apache.org/community/contribute-code/ > > On Sat, Nov 26, 2016 at 10:00 AM, Pat Ferrel <p...@occamsmachete.com> > wrote: > >> This is a better description of how we should be managing code and git >> branches than

Re: Micro-releases

2016-11-26 Thread Pat Ferrel
ent projects. > On 25 Nov 2016, at 18:57, Pat Ferrel <p...@actionml.com> wrote: > > Our dev process includes edge/snapshot code being kept in the develop branch > until release time. I like this process because it allows us to keep master > clean and stable. So imagine tha

Micro-releases

2016-11-25 Thread Pat Ferrel
Our dev process includes edge/snapshot code being kept in the develop branch until release time. I like this process because it allows us to keep master clean and stable. So imagine that we have a major bug fix. To get this to users we could do a release but this can’t happen soon enough if a

Re: [jira] [Updated] (PIO-45) SelfCleaningDatasource erases all data

2016-11-24 Thread Pat Ferrel
rote: Sure, I can try to reproduce this / take a look tomorrow. Alex On Nov 21, 2016 12:05 PM, "Pat Ferrel" <p...@occamsmachete.com <mailto:p...@occamsmachete.com>> wrote: Do you have time to look at this Alex? I may have made a mistake in merging this feature. At present an

Re: Introducing Myself

2016-11-24 Thread Pat Ferrel
Welcome! The biggest problem I see today is helping older user (since before Apache) get up to speed with a doc on the docs site. I recently tried to explain this in a Github issue and gave up—it was getting too long and confusing. If you’ve used PIO for some time you might be able to help

Re: [jira] [Updated] (PIO-45) SelfCleaningDatasource erases all data

2016-11-24 Thread Pat Ferrel
an try to reproduce this / take a look tomorrow. > > Alex > > On Nov 21, 2016 12:05 PM, "Pat Ferrel" <p...@occamsmachete.com> wrote: > >> Do you have time to look at this Alex? I may have made a mistake in >> merging this feature. At present any use of

Re: [jira] [Updated] (PIO-45) SelfCleaningDatasource erases all data

2016-11-21 Thread Pat Ferrel
to see if the problem is reproducible? Or tell me how to run those? It’s included in one of the example templates, right? On Nov 20, 2016, at 5:30 PM, Pat Ferrel (JIRA) <j...@apache.org> wrote: [ https://issues.apache.org/jira/browse/PIO-4

Re: UniversalRecommender performance bottleneck in SimilarityAnalysis.cooccurrencesIDSs

2016-11-21 Thread Pat Ferrel
strange behaviour. It seems coocurrencyIDS do not take into account Spark parellism and ParOpts. Do You have any ideas, how can I control paralelism in coocurrencyIDS, because now it use only 3 cores of 12. Sincerely, Igor Kasianov 2016-11-19 23:04 GMT+02:00 Pat Ferrel <p...@occamsm

[jira] [Created] (PIO-45) SelfCleaningDatasource erases all data

2016-11-20 Thread Pat Ferrel (JIRA)
Pat Ferrel created PIO-45: - Summary: SelfCleaningDatasource erases all data Key: PIO-45 URL: https://issues.apache.org/jira/browse/PIO-45 Project: PredictionIO Issue Type: Bug Affects Versions

Re: UniversalRecommender performance bottleneck in SimilarityAnalysis.cooccurrencesIDSs

2016-11-19 Thread Pat Ferrel
The current head of the template repo repartitions input based on Spark's default parallelism, which I set on the `pio train` CLI to 4 x #-of-cores. This speeds up the math drastically. There are still some things that look like bottlenecks but taking them out make things slower. The labels you

using root LLR

2016-11-15 Thread Pat Ferrel
around 20-30 for raw LLR which corresponds to about 5 for root LLR. I often eyeball the lists of indicators for items that I understand to find a point where the list of indicators becomes about half noise, half useful indicators. On Sat, Jan 2, 2016 at 2:15 PM, Pat Ferrel <p...@occamsmachete

using root LLR

2016-11-15 Thread Pat Ferrel
around 20-30 for raw LLR which corresponds to about 5 for root LLR. I often eyeball the lists of indicators for items that I understand to find a point where the list of indicators becomes about half noise, half useful indicators. On Sat, Jan 2, 2016 at 2:15 PM, Pat Ferrel <p...@occamsmachete

Re: Suggest users based on item

2016-11-08 Thread Pat Ferrel
The easiest way to do this is substitute the user-id with the item-id and vice versa. Do not change the names used in the input JSON or in the queries. Where the input is documented as (user-id, event-type, item-id) just swap the ids, the input may ask for user: user-id but send item-id

Re: PredictionIO and Cassandra

2016-11-03 Thread Pat Ferrel
You’d have to implement some PredicitonIO classes like PEventtStore for Cassandra. Use the HBase specific classes as a template. We’ve considered adding C* but haven’t had the time. PM me if you need help. On Nov 2, 2016, at 7:10 PM, Ali, Syed wrote: I am looking to

Re: Scanner timeouts

2016-10-28 Thread Pat Ferrel
across the cluster, the above may not be big issue. On Fri, Oct 28, 2016 at 11:21 AM, Pat Ferrel <p...@occamsmachete.com> wrote: > Ok, will do. > > So the scanner does not indicate of itself that I’ve missed something in > handling the data. If not index, then made a fast looku

Re: Scanner timeouts

2016-10-28 Thread Pat Ferrel
On Fri, Oct 28, 2016 at 10:22 AM, Pat Ferrel <p...@occamsmachete.com> wrote: > So to clarify there are some values in hbase/conf/hbase-site.xml that are > needed by the calling code in the Spark driver and executors and so must be > passed using --files to spark-submit? If so I can d

Re: Scanner timeouts

2016-10-28 Thread Pat Ferrel
ss.com >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for any >> loss, damage or destruction of data or any other property which may arise >> from relying on this email's technical content is explicitly disclaimed. >> The author will i

Re: Scanner timeouts

2016-10-28 Thread Pat Ferrel
y arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 28 October 2016 at 17:52, Pat Ferrel <p...@occamsmachete.com

Re: Scanner timeouts

2016-10-28 Thread Pat Ferrel
ly set to 6 Did you pass hbase-site.xml using --files to Spark job ? Cheers On Fri, Oct 28, 2016 at 9:27 AM, Pat Ferrel <p...@occamsmachete.com> wrote: > Using standalone Spark. I don’t recall seeing connection lost errors, but > there are lots of logs. I’ve set the scanne

Re: Scanner timeouts

2016-10-28 Thread Pat Ferrel
> loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 28 Octob

Scanner timeouts

2016-10-28 Thread Pat Ferrel
I’m getting data from HBase using a large Spark cluster with parallelism of near 400. The query fails quire often with the message below. Sometimes a retry will work and sometimes the ultimate failure results (below). If I reduce parallelism in Spark it slows other parts of the algorithm

Trouble with HBase and PEventStore.aggregateProperties

2016-10-28 Thread Pat Ferrel
The use of PEventStore.aggregateProperties as shown below causes errors when a large cluster is making the query. It seems to cause a full DB scan, which results in timeouts. This may be because nearly 400 (parallelism of the cluster) threads are making requests. But should this result in a

Trouble with HBase and PEventStore.aggregateProperties

2016-10-28 Thread Pat Ferrel
The use of PEventStore.aggregateProperties as shown below causes errors when a large cluster is making the query. It seems to cause a full DB scan, which results in timeouts. This may be because nearly 400 (parallelism of the cluster) threads are making requests. But should this result in a

Re: The future of PIO

2016-10-25 Thread Pat Ferrel
Oops, should have read Simon’s email. Anyway a GD is created if we want to use that. On Oct 25, 2016, at 10:46 AM, Pat Ferrel <p...@occamsmachete.com> wrote: Now that we have a release what do people think about a joint idea sharing doc like this: https://docs.google.com/docu

The future of PIO

2016-10-25 Thread Pat Ferrel
Now that we have a release what do people think about a joint idea sharing doc like this: https://docs.google.com/document/d/1gxHeTHlAOfkuhO_g8DzbX4LOrTlLAtyWOIoi3qeqSc0/edit?usp=sharing Please

Re: Different environments

2016-10-21 Thread Pat Ferrel
The command line for any pio command that is launched on Spark can specify the master so you can train on one cluster and deploy on another. This is typical when using the ALS recommenders, which use a big cluster to train but deploy with `pio deploy -- --master local[2]` which would use a

Re: Compilations errors in Similar-Product engine template

2016-10-20 Thread Pat Ferrel
you are not using the right template code. It’s not io.prediction… anymore it’s org.apache.predicitionio… now The templates donated to Apache linked to on the web site are not updated to the build method yet. There are PRs for many of the repos that have versions that will build fine. For

Re: Compilations errors in Similar-Product engine template

2016-10-20 Thread Pat Ferrel
you are not using the right template code. It’s not io.prediction… anymore it’s org.apache.predicitionio… now The templates donated to Apache linked to on the web site are not updated to the build method yet. There are PRs for many of the repos that have versions that will build fine. For

Re: clashing hbase queries

2016-10-19 Thread Pat Ferrel
created in a pathological manner. On Oct 15, 2016, at 3:14 PM, Pat Ferrel <p...@occamsmachete.com> wrote: I may have been on the wrong track with the 2 parallel task idea, which is a problem. The typical use with Spark is to get all data out of HBase and work on it as RDDs but getting

Re: Installing on Linux/Mac through tar

2016-10-19 Thread Pat Ferrel
All installations are the same now, download source and build with make-distribution. The tar contains source, not a built artifact. On Oct 19, 2016, at 3:28 AM, Saurav Sarkar wrote: Thanks Paul... @All: Any idea why the option of installation on Linux/MacOS was

Re: Universal Recommender date boosts

2016-10-19 Thread Pat Ferrel
Create categorical date ranges and attach them to items. They will be "0-10 days”, "11-20 days” etc. Something that encodes how far back from today the data is. This will, of course need to be updated periodically with a property change event. Then in your engine.json or query boost "0-10

Re: Issue while building engine

2016-10-19 Thread Pat Ferrel
PIO is a source release, you will have to build it locally. Artifacts have been put into the Maven world but there is no prebuild verison for download only the various classes needed to build templates. Clearly the use of io.prediction means a *template’s* build.sbt is incorrect because

[jira] [Resolved] (MAHOUT-1853) Improvements to CCO (Correlated Cross-Occurrence)

2016-10-16 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pat Ferrel resolved MAHOUT-1853. Resolution: Fixed > Improvements to CCO (Correlated Cross-Occurre

[jira] [Resolved] (MAHOUT-1883) Create a type if IndexedDataset that filters unneeded data for CCO

2016-10-16 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pat Ferrel resolved MAHOUT-1883. Resolution: Fixed Hmm, I thought these were aut-resolved with a commit that contains the issue

clashing hbase queries

2016-10-13 Thread Pat Ferrel
The DAG for a template just happens to schedule 2 tasks that do something like this: val fieldsRDD: RDD[(ItemID, PropertyMap)] = PEventStore.aggregateProperties( appName = dsp.appName, entityType = "item")(sc) to execute in parallel The PEventStore calls from 2 separate closures start

Re: [VOTE] Release Apache PredictionIO 0.10.0 (incubating) RC5

2016-10-07 Thread Pat Ferrel
To clarify the only thing you will ever find in master is a set of commits tagged as an RC or release, never the intermediate changes as in typical master/snapshots. On Oct 7, 2016, at 8:59 AM, Pat Ferrel <p...@occamsmachete.com> wrote: What Donald said but also we use a slightly dif

Re: [VOTE] Release Apache PredictionIO 0.10.0 (incubating) RC5

2016-10-07 Thread Pat Ferrel
What Donald said but also we use a slightly different process than most Apache projects. Master is not a snapshot, we keep snapshots in the “develop” branch until RCs start and they are the only contents of master ever—that is RCs or releases. This process is not typical in Apache but I don’t

Re: [VOTE] Release Apache PredictionIO 0.10.0 (incubating) RC5

2016-10-06 Thread Pat Ferrel
I’ll second the ping and since I’m much less polite than Donald I’d like to add that there are PRs and branches to be merged from a number of people that want to contribute. A release would be a shot in the arm to our growing community. Lots of good work is waiting on the vote. +1

Re: Launching PredictionIO on aws

2016-10-03 Thread Pat Ferrel
There probably won’t be an Apache offering because the AWS Marketplace is a commercial thing but if you private email me I can tell you about offerings my company is working on. These will be available after the PredictionIO release, which is in process right now. On Oct 3, 2016, at 2:36 PM,

Re: spark-itemsimilarity slower than itemsimilarity

2016-10-03 Thread Pat Ferrel
Except for reading the input it now takes ~5 minutes to train. On Sep 30, 2016, at 5:12 PM, Pat Ferrel <p...@occamsmachete.com> wrote: Yeah, I bet Sebastian is right. I see no reason not to try running with --master local[4] or some number of cores on localhost. This will avo

[jira] [Updated] (MAHOUT-1883) Create a type if IndexedDataset that filters unneeded data for CCO

2016-10-01 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pat Ferrel updated MAHOUT-1883: --- Issue Type: New Feature (was: Bug) > Create a type if IndexedDataset that filters unneeded d

[jira] [Updated] (MAHOUT-1883) Create a type if IndexedDataset that filters unneeded data for CCO

2016-10-01 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pat Ferrel updated MAHOUT-1883: --- Sprint: Jan/Feb-2016 > Create a type if IndexedDataset that filters unneeded data for

[jira] [Updated] (MAHOUT-1883) Create a type if IndexedDataset that filters unneeded data for CCO

2016-10-01 Thread Pat Ferrel (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pat Ferrel updated MAHOUT-1883: --- Description: The collaborative filtering CCO algo uses drms for each "indicator" type.

[jira] [Created] (MAHOUT-1883) Create a type if IndexedDataset that filters unneeded data for CCO

2016-10-01 Thread Pat Ferrel (JIRA)
Pat Ferrel created MAHOUT-1883: -- Summary: Create a type if IndexedDataset that filters unneeded data for CCO Key: MAHOUT-1883 URL: https://issues.apache.org/jira/browse/MAHOUT-1883 Project: Mahout

Re: [VOTE] Apache PredictionIO (incubating) 0.10.0 Release (RC5)

2016-10-01 Thread Pat Ferrel
+1 binding On Oct 1, 2016, at 10:20 AM, Suneel Marthi wrote: +1 binding On Sat, Oct 1, 2016 at 12:05 PM, Matthew Tovbin wrote: > +1 > > - Matthew > > On Oct 1, 2016 00:18, "Donald Szeto" wrote: > >> This is the vote for

Re: spark-itemsimilarity slower than itemsimilarity

2016-09-30 Thread Pat Ferrel
Yeah, I bet Sebastian is right. I see no reason not to try running with --master local[4] or some number of cores on localhost. This will avoid all serialization. With times that low and small data there is no benefit to separate machines. We are using this with ~1TB of data. Using Mahout as a

Re: spark-itemsimilarity slower than itemsimilarity

2016-09-28 Thread Pat Ferrel
. This brings the cost to a quite reasonable range. You are very unlikely to need machines that large anyway but you could afford it if you only pay for the time they are actually used. On Sep 26, 2016, at 12:30 AM, Arnau Sanchez <pyar...@gmail.com> wrote: On Sun, 25 Sep 2016 09:01:43 -07

Re: delay of engines

2016-09-28 Thread Pat Ferrel
still in a prototype stage. I am thinking about a fitting deployment option but am still unsure what is the fitting solution to move forward. Pat Ferrel <p...@occamsmachete.com <mailto:p...@occamsmachete.com>> schrieb am Di., 27. Sep. 2016 um 19:36 Uhr: 2 examples of real-time mo

Re: spark-itemsimilarity slower than itemsimilarity

2016-09-25 Thread Pat Ferrel
AWS EMR is usually not very well suited for Spark. Spark get’s most of it’s speed from in-memory calculations. So to see speed gains you have to have enough memory. Also partitioning will help in many cases. If you read in data from a single file—that partitioning will usually follow the

Re: [VOTE]: Apache PredictionIO (incubating) 0.10.0 Release (RC4)

2016-09-24 Thread Pat Ferrel
+1 binding On Sep 23, 2016, at 12:28 PM, Alex Merritt wrote: +1 (Binding) Extracted, built, installed manually (with HBase / ElasticSearch / local fs configured in pio-env.sh), start script works, status works, sbt tests pass. Had low system entropy which caused fail on

Re: Remove engine registration

2016-09-21 Thread Pat Ferrel
nly as code being a dependency for > application related models/algorithms. So you would register an engine - as > a code once and run training for some domain specific data (app) and > algorithm parameters, what would result in a different identifier, that > would be later used f

Re: Remove engine registration

2016-09-21 Thread Pat Ferrel
nly as code being a dependency for > application related models/algorithms. So you would register an engine - as > a code once and run training for some domain specific data (app) and > algorithm parameters, what would result in a different identifier, that > would be later used f

Re: Batch import, Java

2016-09-20 Thread Pat Ferrel
bution. Is that possible? Thanks Gustavo On Fri, Sep 16, 2016 at 2:25 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > Which brings up the next set of issues: What do we do for Salesforce owned > SDKs? Can the SDKs be donated? > > In any case I suggest we add to the “gallery” so it might

Re: [VOTE]: Apache PredictionIO (incubating) 0.10.0 Release

2016-09-20 Thread Pat Ferrel
ictionio/0.10.0- >>>> incubating-rc2/ >>>> or from the Maven staging repo here: >>>> https://repository.apache.org/content/repositories/ >>>> orgapachepredictionio-1003/ >>>> . >>>> >>>> Do we need to vote on the RC2 again? &

Re: Remove engine registration

2016-09-18 Thread Pat Ferrel
This sounds like a good case for Donald’s suggestion. What I was trying to add to the discussion is a way to make all commands rely on state in the megastore, rather than any file on any machine in a cluster or on ordering of execution or execution from a location in a directory structure.

Recommenders and MABs

2016-09-17 Thread Pat Ferrel
I’ve been thinking about how one would implement an application that only shows recommendations. This is partly because people want to build such things. There are many problems with this including cold start and overfit. However these problems also face MABs and are solved with sampling

Recommenders and MABs

2016-09-17 Thread Pat Ferrel
I’ve been thinking about how one would implement an application that only shows recommendations. This is partly because people want to build such things. There are many problems with this including cold start and overfit. However these problems also face MABs and are solved with sampling

Re: Remove engine registration

2016-09-17 Thread Pat Ferrel
o/some-engine.json --instanceId some-REST-compatible-resource-id` ? Currently PIO also has a concept of engineInstanceId, which is output of train. I think you are referring to different thing, right? Kenneth On Fri, Sep 16, 2016 at 12:58 PM, Pat Ferrel <p...@occamsmachete.com <mailto:p...@occam

Re: Remove engine registration

2016-09-16 Thread Pat Ferrel
This is a great discussion topic and a great idea. However the cons must also be addressed, we will need to do this before multi-tenant deploys can happen and the benefits are just as large as removing `pio build` It would be great to get rid of manifest.json and put all metadata in the store

Re: Batch import, Java

2016-09-16 Thread Pat Ferrel
f849d3 <https://github.com/PredictionIO/PredictionIO-Java-SDK/commit/6691144ebf1382aa1d060770a4fb7c0268f849d3> On Fri, Sep 9, 2016 at 7:59 AM, Pat Ferrel <p...@occamsmachete.com <mailto:p...@occamsmachete.com>> wrote: The page is now live On Sep 8, 2016, at 11:

Re: Batch import, Java

2016-09-16 Thread Pat Ferrel
f849d3 <https://github.com/PredictionIO/PredictionIO-Java-SDK/commit/6691144ebf1382aa1d060770a4fb7c0268f849d3> On Fri, Sep 9, 2016 at 7:59 AM, Pat Ferrel <p...@occamsmachete.com <mailto:p...@occamsmachete.com>> wrote: The page is now live On Sep 8, 2016, at 11:

Re: [VOTE]: Apache PredictionIO (incubating) 0.10.0 Release

2016-09-15 Thread Pat Ferrel
robberp...@outlook.com> wrote: >>> >>> +1 >>> >>>> On Sep 15, 2016, at 01:13, Matthew Tovbin <tovb...@apache.org> wrote: >>>> >>>> +1 >>>> >>>> On Wed, Sep 14, 2016 at 10:12 AM, Pat Ferrel <

Re: Multitenancy on the Universal Product Recommender

2016-09-09 Thread Pat Ferrel
pports multi-tenancy with lightweight Actors one per tenant. On Thu, Sep 8, 2016 at 7:52 PM, Pat Ferrel <p...@occamsmachete.com <mailto:p...@occamsmachete.com>> wrote: I’m the maintainer of the Universal Recommender. We have OSS support at https://groups.google.com/forum/#!forum/act

Re: pio build & train - server available?

2016-09-09 Thread Pat Ferrel
Think of PIO as 2 or more servers: 1) EventServer, which may run continuously, taking events from outside and queries during pio build or from the second server 2) PredictionServer, there is one of these per engine. It serves query results. Some templates allow these to run continuously (like

Re: Batch import, Java

2016-09-09 Thread Pat Ferrel
The page is now live On Sep 8, 2016, at 11:49 AM, Gustavo Frederico wrote: The page at http://predictionio.incubator.apache.org/datacollection/batchimport/ displays "(coming soon)" for

Re: Multitenancy on the Universal Product Recommender

2016-09-08 Thread Pat Ferrel
I’m the maintainer of the Universal Recommender. We have OSS support at https://groups.google.com/forum/#!forum/actionml-user Do you wish to take advantage of the same user being in multiple datasets/tenants? The answer below is assuming

NOTICE TO ALL USERS

2016-09-08 Thread Pat Ferrel
There have been multiple reports of memory problems, performance problems (that may be memory related), and OOM errors. I have been telling everyone to use: pio train -- --driver-memory 14g --executor-memory 14g --master spark://master-address:7077 To increase the memory for Spark and

<    1   2   3   4   5   6   7   8   9   10   >