tensor factorization FR

2016-06-20 Thread Roberto Pagliari
There are a number of research papers about tensor factorization and its use in machine learning. Is tensor factorization in the roadmap?

RBM in mllib

2016-06-14 Thread Roberto Pagliari
Is RBM being developed? This one is marked as resolved, but it is not https://issues.apache.org/jira/browse/SPARK-4251

access to nonnegative flag with ALS trainImplicit

2016-04-28 Thread Roberto Pagliari
I'm using ALS with mllib 1.5.2 in Scala. I do not have access to the nonnegative flag in trainImplicit. Which API is it available from?

Re: ALS setIntermediateRDDStorageLevel

2016-03-22 Thread Roberto Pagliari
I have and it¹s under class ALS private On 22/03/2016 10:58, "Sean Owen" <so...@cloudera.com> wrote: >No, it's been there since 1.1 and still is there: >setIntermediateRDDStorageLevel. Double-check your code. > >On Mon, Mar 21, 2016 at 10:09 PM, Roberto Pagliari

Re: cluster randomly re-starting jobs

2016-03-21 Thread Roberto Pagliari
Yes you are right. The job failed and it was re-attempting. Thank you, From: Daniel Siegmann <daniel.siegm...@teamaol.com<mailto:daniel.siegm...@teamaol.com>> Date: Monday, 21 March 2016 21:33 To: Ted Yu <yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> Cc: Roberto

ALS setIntermediateRDDStorageLevel

2016-03-21 Thread Roberto Pagliari
According to this thread http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-ALS-question-td15420.html There should be a function to set intermediate storage level in ALS. However, I'm getting method not found with Spark 1.6. Is it still available? If so, can I get to see a minimal

cluster randomly re-starting jobs

2016-03-21 Thread Roberto Pagliari
I noticed that sometimes the spark cluster seems to restart the job completely. In the Ambari UI (where I can check jobs/stages) everything that was done up to a certain point is removed, and the job is restarted. Does anyone know what the issue could be? Thank you,

ALS update without re-computing everything

2016-03-11 Thread Roberto Pagliari
In the current implementation of ALS with implicit feedback, when new date come in, it is not possible to update user/product matrices without re-computing everything. Is this feature in planning or any known work around? Thank you,

ALS trainImplicit performance

2016-02-25 Thread Roberto Pagliari
Does anyone know about the maximum number of ratings ALS was tested successfully? For example, is 1 billion ratings (nonzero entries) too much for it to work properly? Thank you,

caching ratigs with ALS implicit

2016-02-15 Thread Roberto Pagliari
Something not clear from the documentation is weather the ratings RDD needs to be cached before calling ALS trainImplicit. Would there be any performance gain?

Re: recommendations with duplicate ratings

2016-02-15 Thread Roberto Pagliari
>so you do need to aggregate. > >On Mon, Feb 15, 2016 at 8:30 PM, Roberto Pagliari ><roberto.pagli...@asos.com> wrote: >> What happens when duplicate user/ratings are fed into ALS (the implicit >> version, specifically)? Are duplicates ignored? >> >> I

recommendations with duplicate ratings

2016-02-15 Thread Roberto Pagliari
What happens when duplicate user/ratings are fed into ALS (the implicit version, specifically)? Are duplicates ignored? I'm asking because that would save me a distinct. Thank you,

Re: ALS rating caching

2016-02-09 Thread Roberto Pagliari
RDD storage level (for user and item factors) using finalRDDStorageLevel. The old MLLIB API now calls the new ALS so the same semantics apply. So it should not be necessary to cache the raw input RDD. On Tue, 9 Feb 2016 at 01:48 Roberto Pagliari <roberto.pagli...@asos.com<mailto:roberto.pagli...@asos

ALS rating caching

2016-02-08 Thread Roberto Pagliari
When using ALS from mllib, would it be better/recommended to cache the ratings RDD? I'm asking because when predicting products for users (for example) it is recommended to cache product/user matrices. Thank you,

recommendProductsForUser for a subset of user

2016-02-02 Thread Roberto Pagliari
When using ALS, is it possible to use recommendProductsForUser for a subset of users? Currently, productFeatures and userFeatures are val. Is there a workaround for it? Using recommendForUser repeatedly would not work in my case, since it would be too slow with many users. Thank you,

is recommendProductsForUsers available in ALS?

2016-01-18 Thread Roberto Pagliari
With Spark 1.5, the following code: from pyspark import SparkContext, SparkConf from pyspark.mllib.recommendation import ALS, Rating r1 = (1, 1, 1.0) r2 = (1, 2, 2.0) r3 = (2, 1, 2.0) ratings = sc.parallelize([r1, r2, r3]) model = ALS.trainImplicit(ratings, 1, seed=10)

Re: frequent itemsets

2016-01-02 Thread Roberto Pagliari
that very well. Thank you, From: Yanbo Liang <yblia...@gmail.com<mailto:yblia...@gmail.com>> Date: Saturday, 2 January 2016 09:03 To: Roberto Pagliari <roberto.pagli...@asos.com<mailto:roberto.pagli...@asos.com>> Cc: "user@spark.apache.org<mailto:user@spark.apac

Re: frequent itemsets

2016-01-02 Thread Roberto Pagliari
t;mailto:m2linc...@outlook.com>> Date: Saturday, 2 January 2016 14:48 To: Roberto Pagliari <roberto.pagli...@asos.com<mailto:roberto.pagli...@asos.com>> Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache

frequent itemsets

2016-01-01 Thread Roberto Pagliari
When using the frequent itemsets APIs, I'm running into stackOverflow exception whenever there are too many combinations to deal with and/or too many transactions and/or too many items. Does anyone know how many transactions/items these APIs can deal with? Thank you ,

argparse with pyspark

2015-12-21 Thread Roberto Pagliari
Is argparse compatible with pyspark? If so, how do I provide parameters from command line? It does not seem to work the usual way. Thank you,

ALS predictAll does not generate all the user/item ratings

2015-12-18 Thread Roberto Pagliari
I created the following data, data.file 1 1 1 2 1 3 2 4 3 5 4 6 5 7 6 1 7 2 8 8 The following code: def parse_line(line): tokens = line.split(' ') return (int(tokens[0]), int(tokens[1])), 1.0 lines = sc.textFile('./data.file') linesTest = sc.textFile('./data.file')

number of blocks in ALS/recommendation API

2015-12-17 Thread Roberto Pagliari
What is the meaning of the 'blocks' input argument in mllib ALS implementation, and how does that relate to the number of executors and/or size of the input data? Thank you,

ALS mllib.recommendation vs ml.recommendation

2015-12-14 Thread Roberto Pagliari
Currently, there are two implementations of ALS available: ml.recommendation.ALS and

ALS with repeated entries

2015-12-09 Thread Roberto Pagliari
What happens with ALS when the same pair of user/item appears more than once with either the same ratings or different ratings?

Re: Python API Documentation Mismatch

2015-12-04 Thread Roberto Pagliari
eun...@hotmail.com>> Cc: Roberto Pagliari <roberto.pagli...@asos.com<mailto:roberto.pagli...@asos.com>>, "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: Python API

Python API Documentation Mismatch

2015-12-03 Thread Roberto Pagliari
Hello, I believe there is a mismatch between the API documentation (1.5.2) and the software currently available. Not all functions mentioned here http://spark.apache.org/docs/latest/api/python/pyspark.ml.html#module-pyspark.ml.recommendation are, in fact available. For example, the code below

Jupyter configuration

2015-12-02 Thread Roberto Pagliari
Does anyone have a pointer to Jupyter configuration with pyspark? The current material on python inotebook is out of date, and jupyter ignores ipython profiles. Thank you,