Development methodology

2014-03-01 Thread Debasish Das
Hi, We have a mirror repo of spark at our internal stash. We are adding changes to a fork of the mirror so that down the line we can push the contributions back to Spark git. I am not sure what's the exact the development methodology we should follow as things are a bit complicated due to

Re: Development methodology

2014-03-01 Thread Debasish Das
what you mean by enterprise stash. But PR is a concept unique to Github. There is no PR model in normal git or the git ASF maintains. On Sat, Mar 1, 2014 at 11:28 AM, Debasish Das debasish.da...@gmail.com wrote: Hi, We have a mirror repo of spark at our internal stash. We are adding

Re: Development methodology

2014-03-02 Thread Debasish Das
has to come through github ? I could merge for example @dbtsai github lbfgs branch to my branch at stash... Thanks. Deb On Sat, Mar 1, 2014 at 12:43 PM, Debasish Das debasish.da...@gmail.comwrote: Stash is an enterprise git from atlassian.. I got it...Basically the PRs are managed by github

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-03 Thread Debasish Das
infrastructure around this? Thanks. Sincerely, DB Tsai Machine Learning Engineer Alpine Data Labs -- Web: http://alpinenow.com/ On Sun, Mar 2, 2014 at 10:23 AM, Debasish Das debasish.da...@gmail.com wrote: Hi DB, 1. Could you point to the BFGS

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-05 Thread Debasish Das
Hi David, Few questions on breeze solvers: 1. I feel the right place of adding useful things from RISO LBFGS (based on Professor Nocedal's fortran code) will be breeze. It will involve stress testing breeze LBFGS on large sparse datasets and contributing fixes to existing breeze LBFGS with the

ALS solve.solvePositive

2014-03-06 Thread Debasish Das
Hi, I am running ALS on a sparse problem (10M x 1M) and I am getting the following error: org.jblas.exceptions.LapackArgumentException: LAPACK DPOSV: Leading minor of order i of A is not positive definite. at org.jblas.SimpleBlas.posv(SimpleBlas.java:373) at

QR decomposition in Spark ALS

2014-03-06 Thread Debasish Das
definite. Therefore, we chose QR decomposition to solve the linear system. --sebastian On 03/06/2014 03:44 PM, Debasish Das wrote: Hi, I am running ALS on a sparse problem (10M x 1M) and I am getting the following error: org.jblas.exceptions.LapackArgumentException: LAPACK DPOSV: Leading

Re: QR decomposition in Spark ALS

2014-03-06 Thread Debasish Das
something there.) Even though your data is huge, if it was generated by some synthetic process, maybe it is very low rank? QR decomposition is pretty good here, yes. -- Sean Owen | Director, Data Science | London On Thu, Mar 6, 2014 at 3:05 PM, Debasish Das debasish.da...@gmail.com

Re: ALS solve.solvePositive

2014-03-07 Thread Debasish Das
Hi Xiangrui, I used lambda = 0.1...It is possible that 2 users ranked in movies in a very similar way... I agree that increasing lambda will solve the problem but you agree this is not a solution...lambda should be tuned based on sparsity / other criteria and not to make a linearly dependent

Re: ALS solve.solvePositive

2014-03-19 Thread Debasish Das
. Thanks. Deb On Wed, Mar 19, 2014 at 10:11 AM, Xiangrui Meng men...@gmail.com wrote: Another question: do you have negative or out-of-range user or product ids or? -Xiangrui On Tue, Mar 11, 2014 at 8:00 PM, Debasish Das debasish.da...@gmail.com wrote: Nope..I did not test implicit feedback

Re: new Catalyst/SQL component merged into master

2014-03-21 Thread Debasish Das
Awesome news ! It will be great if there are any examples or usecases to look at ? We are looking into shark/ooyala job server to give in memory sql analytics, model serving/scoring features for dashboard apps... Does this feature has different usecases than shark or more cleaner as hive

Re: ALS memory limits

2014-03-26 Thread Debasish Das
. On Wed, Mar 26, 2014 at 6:06 AM, Debasish Das debasish.da...@gmail.com wrote: Hi, For our usecases we are looking into 20 x 1M matrices which comes in the similar ranges as outlined by the paper over here: http://sandeeptata.blogspot.com/2012/12/sparkler-large-scale-matrix.html

Re: Any suggestion about JIRA 1006 MLlib ALS gets stack overflow with too many iterations?

2014-03-27 Thread Debasish Das
Hi Matei, I am hitting similar problems with 10 ALS iterations...I am running with 24 gb executor memory on 10 nodes for 20M x 3 M matrix with rank =50 The first iteration of flatMaps run fine which means that the memory requirements are good per iteration... If I do check-pointing on RDD, most

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-31 Thread Debasish Das
:01 PM, Debasish Das debasish.da...@gmail.com wrote: Hi David, I have started to experiment with BFGS solvers for Spark GLM over large scale data... I am also looking to add a good QP solver in breeze that can be used in Spark ALS for constraint solves...More details on that soon

Re: Recent heartbeats

2014-04-05 Thread Debasish Das
Thanks Patrick...I searched in the archives and found the answer...tuning the akka and gc params On Fri, Apr 4, 2014 at 10:35 PM, Patrick Wendell pwend...@gmail.com wrote: I answered this over on the user list... On Fri, Apr 4, 2014 at 6:13 PM, Debasish Das debasish.da...@gmail.com

Master compilation

2014-04-05 Thread Debasish Das
I am synced with apache/spark master but getting error in spark/sql compilation... Is the master broken ? [info] Compiling 34 Scala sources to /home/debasish/spark_deploy/sql/core/target/scala-2.10/classes... [error]

Re: Master compilation

2014-04-05 Thread Debasish Das
open a hot-fix PR after looking for other stuff like this that might have snuck in. -- Sean Owen | Director, Data Science | London On Sat, Apr 5, 2014 at 10:04 PM, Debasish Das debasish.da...@gmail.com wrote: I am synced with apache/spark master but getting error in spark/sql compilation

Re: Master compilation

2014-04-05 Thread Debasish Das
not sure why it didn't fail our build... On Sat, Apr 5, 2014 at 2:30 PM, Debasish Das debasish.da...@gmail.com wrote: I verified this is happening for both CDH4.5 and 1.0.4...My deploy environment is Java 6...so Java 7 compilation is not going to help... Is this the PR which caused

Re: Master compilation

2014-04-05 Thread Debasish Das
to submit a hot fix for this issue specifically please do. I'm not sure why it didn't fail our build... On Sat, Apr 5, 2014 at 2:30 PM, Debasish Das debasish.da...@gmail.com wrote: I verified this is happening for both CDH4.5 and 1.0.4...My deploy environment is Java 6...so Java 7

Re: Master compilation

2014-04-06 Thread Debasish Das
it with java 6 just fine. something compiled with java7 with -target 1.7 will not run on java 6 On Sat, Apr 5, 2014 at 9:10 PM, Debasish Das debasish.da...@gmail.com wrote: With jdk7 I could compile it fine: java version 1.7.0_51 Java(TM) SE Runtime Environment (build 1.7.0_51-b13

Re: Master compilation

2014-04-06 Thread Debasish Das
in Breeze of just the same form we just saw. It's worth opening an issue since, indeed, I would expect exactly the compile error you see with Java 6. But it should not stop you from building Spark. On Sun, Apr 6, 2014 at 5:00 PM, Debasish Das debasish.da...@gmail.com wrote: Hi Koert, How

Re: Any suggestion about JIRA 1006 MLlib ALS gets stack overflow with too many iterations?

2014-04-06 Thread Debasish Das
at 3:20 PM, Debasish Das debasish.da...@gmail.comwrote: Hi Matei, I am hitting similar problems with 10 ALS iterations...I am running with 24 gb executor memory on 10 nodes for 20M x 3 M matrix with rank =50 The first iteration of flatMaps run fine which means that the memory requirements

Re: ALS array index out of bound with 50 factors

2014-04-07 Thread Debasish Das
you can reproduce the error on a public data set, e.g., movielens? Thanks! Best, Xiangrui On Sat, Apr 5, 2014 at 10:53 PM, Debasish Das debasish.da...@gmail.com wrote: Hi, I deployed apache/spark master today and recently there were many ALS related checkins and enhancements.. I am

Re: ALS array index out of bound with 50 factors

2014-04-07 Thread Debasish Das
, Apr 5, 2014 at 10:53 PM, Debasish Das debasish.da...@gmail.com wrote: Hi, I deployed apache/spark master today and recently there were many ALS related checkins and enhancements.. I am running ALS with explicit feedback and I remember most enhancements were related to implicit

Re: ALS array index out of bound with 50 factors

2014-04-07 Thread Debasish Das
commit? 2) Do you have negative or out-of-integer-range user or product ids? Try to print out the max/min value of user/product ids. Best, Xiangrui On Sun, Apr 6, 2014 at 11:01 PM, Debasish Das debasish.da...@gmail.com wrote: Hi Xiangrui, With 4 ALS iterations it runs fine...If I run 10

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-04-08 Thread Debasish Das
I got your checkinI need to run logistic regression SGD vs BFGS for my current usecases but your next checkin will update the logistic regression with LBFGS right ? Are you adding it to regression package as well ? Thanks. Deb On Mon, Apr 7, 2014 at 7:00 PM, DB Tsai dbt...@stanford.edu

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-04-08 Thread Debasish Das
By the way...what's the idea...the labeled data set is a RDD which is cached on all nodes.. The bfgs solver is maintained on the master or each worker is supposed to maintain it's own bfgs... On Mon, Apr 7, 2014 at 11:23 PM, Debasish Das debasish.da...@gmail.comwrote: I got your checkinI

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-04-08 Thread Debasish Das
to L-BFGS. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Tue, Apr 8, 2014 at 9:42 AM, Debasish Das debasish.da...@gmail.com wrote: Hi DB, Are we going to clean up the function

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-04-08 Thread Debasish Das
--- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Tue, Apr 8, 2014 at 4:05 PM, Debasish Das debasish.da...@gmail.com wrote: Yup that's what I expected...L-BFGS solver is in the master and gradient computation per RDD is done on each

Publish spark jars to artifactory

2014-04-19 Thread Debasish Das
Hi, I saw in the code that spark jars are published on sonatype but I was wondering if you guys have published spark jars to artifactory as...Cloudera uses artifactory... Somehow I can publish maven projects to artifactory but after following the sbt link:

mllib vector templates

2014-05-05 Thread Debasish Das
Hi, Why mllib vector is using double as default ? /** * Represents a numeric vector, whose index type is Int and value type is Double. */ trait Vector extends Serializable { /** * Size of the vector. */ def size: Int /** * Converts the instance to a double array.

Re: mllib vector templates

2014-05-05 Thread Debasish Das
. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Mon, May 5, 2014 at 2:41 PM, Debasish Das debasish.da...@gmail.com wrote: Hi, Why mllib vector is using double as default

Re: mllib vector templates

2014-05-05 Thread Debasish Das
://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Mon, May 5, 2014 at 2:56 PM, Debasish Das debasish.da...@gmail.com wrote: Is this a breeze issue or breeze can take templates on float / double ? If breeze can take templates then it is a minor fix for Vectors.scala

LabeledPoint toString to dump LibSvm if SparseVector

2014-05-10 Thread Debasish Das
Hi, I need to change the toString on LabeledPoint to libsvm format so that I can dump RDD[LabeledPoint] as a format that could be read by sparse glmnet-R and other packages to benchmark mllib classification accuracy... Basically I have to change the toString of LabeledPoint and toString of

Re: mllib vector templates

2014-05-11 Thread Debasish Das
, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Mon, May 5, 2014 at 2:56 PM, Debasish Das debasish.da...@gmail.com wrote: Is this a breeze issue or breeze can

Re: mllib vector templates

2014-05-12 Thread Debasish Das
) in the matrix template. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Mon, May 5, 2014 at 2:56 PM, Debasish Das debasish.da...@gmail.com wrote

Re: [VOTE] Release Apache Spark 1.0.0 (RC11)

2014-06-04 Thread Debasish Das
Hi Patrick, We maintain internal Spark mirror in sync with Spark github master... What's the way to get the 1.0.0 stable release from github to deploy on our production cluster ? Is there a tag for 1.0.0 that I should use to deploy ? Thanks. Deb On Wed, Jun 4, 2014 at 10:49 AM, Patrick

Constraint Solver for Spark

2014-06-05 Thread Debasish Das
Hi, We are adding a constrained ALS solver in Spark to solve matrix factorization use-cases which needs additional constraints (bounds, equality, inequality, quadratic constraints) We are using a native version of a primal dual SOCP solver due to its small memory footprint and sparse ccs matrix

Re: Constraint Solver for Spark

2014-06-05 Thread Debasish Das
. You can define an interface for the subproblem solvers and maintain the IPM solver at your own code base, if the only information you need is Y^T Y and Y^T b. Btw, just curious, what is the use case for quadratic constraints? Best, Xiangrui On Thu, Jun 5, 2014 at 3:38 PM, Debasish Das debasish.da

Re: Constraint Solver for Spark

2014-06-06 Thread Debasish Das
include it in the classpath. Creating two separate files still seems unnecessary to me. Could you create a JIRA and we can move our discussion there? Thanks! Best, Xiangrui On Thu, Jun 5, 2014 at 7:20 PM, Debasish Das debasish.da...@gmail.com wrote: Hi Xiangrui, For orthogonality

Re: Constraint Solver for Spark

2014-06-10 Thread Debasish Das
= Solve.solvePositive(fullXtX, userXy (index)).data } } On Tue, Jun 10, 2014 at 8:56 PM, Debasish Das debasish.da...@gmail.com wrote: Hi, I am bit confused wiht the code here: // Solve the least-squares problem for each user and return the new feature vectors Array.range(0

Re: Artificial Neural Network in Spark?

2014-06-27 Thread Debasish Das
Look into Powered by Spark page...I found a project there which used autoencoder functions...It's not updated for a long time now ! On Thu, Jun 26, 2014 at 10:51 PM, Ulanov, Alexander alexander.ula...@hp.com wrote: Hi Bert, It would be extremely interesting. Do you plan to implement

Re: Spark Matrix Factorization

2014-06-27 Thread Debasish Das
Hi, In my experiments with Jellyfish I did not see any substantial RMSE loss over DSGD for Netflix dataset... So we decided to stick with ALS and implemented a family of Quadratic Minimization solvers that stays in the ALS realm but can solve interesting constraints(positivity, bounds, L1,

Linear CG solver

2014-06-27 Thread Debasish Das
Hi, I am looking for an efficient linear CG to be put inside the Quadratic Minimization algorithms we added for Spark mllib. With a good linear CG, we should be able to solve kernel SVMs with this solver in mllib... I use direct solves right now using cholesky decomposition which has higher

Re: Linear CG solver

2014-06-27 Thread Debasish Das
targeted for norm-constrained solutions of the CG problem. On Fri, Jun 27, 2014 at 5:54 PM, Debasish Das debasish.da...@gmail.com wrote: Hi, I am looking for an efficient linear CG to be put inside the Quadratic Minimization algorithms we added for Spark mllib. With a good linear CG

Re: Spark Matrix Factorization

2014-06-28 Thread Debasish Das
, Jun 27, 2014 at 12:47 PM, Debasish Das [via Apache Spark Developers List] ml-node+s1001551n7098...@n3.nabble.com wrote: Hi, In my experiments with Jellyfish I did not see any substantial RMSE loss over DSGD for Netflix dataset... So we decided to stick with ALS and implemented a family

Re: Linear CG solver

2014-06-28 Thread Debasish Das
for bound-constrained CG, though bounded LBFGS is more common. I think code for Nystrom approximations or kernel mappings would be more useful. On Fri, Jun 27, 2014 at 5:42 PM, Debasish Das debasish.da...@gmail.com wrote: Thanks David...Let me try it...I am keen to see the results first

Re: Linear CG solver

2014-06-28 Thread Debasish Das
think the numeric support is lacking in Java land. On Sat, Jun 28, 2014 at 1:47 PM, Debasish Das debasish.da...@gmail.com wrote: Hi, I am coming up with an iterative solver for Equality and bound constrained quadratic minimization... I have the cholesky versions running

Re: Artificial Neural Network in Spark?

2014-06-30 Thread Debasish Das
3 outputs) functions. If there is interest, I would be happy to release the code. What would be the best way to do this? Is there some kind of review process? Best regards, Bert -Original Message- From: Debasish Das [mailto:debasish.da...@gmail.com] Sent: 27 June 2014 14:02

Re: Constraint Solver for Spark

2014-07-02 Thread Debasish Das
the subproblems no longer decoupled, that would certainly affects scalability. -Xiangrui On Wed, Jun 11, 2014 at 2:20 AM, Debasish Das debasish.da...@gmail.com wrote: I got it...ALS formulation is solving the matrix completion problem To convert the problem to matrix factorization or take

Re: Constraint Solver for Spark

2014-07-03 Thread Debasish Das
or the objective function is complex but splittable. Neither applies to this case. Best, Xiangrui On Tue, Jul 1, 2014 at 11:05 PM, Debasish Das debasish.da...@gmail.com wrote: Hi Xiangrui, Could you please point to the IPM solver that you have positive results with ? I was planning

Re: PLSA

2014-07-03 Thread Debasish Das
Hi Denis, Are you using matrix factorization to generate the latent factors ? Thanks. Deb On Thu, Jul 3, 2014 at 8:49 AM, Denis Turdakov turda...@ispras.ru wrote: Hello guys, We made pull request with PLSA and its modifications: - https://github.com/apache/spark/pull/1269 - JIRA issue

Re: PLSA

2014-07-04 Thread Debasish Das
Thanks for the pointer... Looks like you are using EM algorithm for factorization which looks similar to multiplicative update rules Do you think using mllib ALS implicit feedback, you can scale the problem further ? We can handle L1, L2, equality and positivity constraints in ALS now...As long

Re: Constraint Solver for Spark

2014-07-04 Thread Debasish Das
I looked further and realized that ECOS used a mex file while PDCO is using pure Matlab code. So the out-of-box runtime comparison is not fair. I am trying to generate PDCO C port. Like ECOS, PDCO also makes use of sparse support from Tim Davis. Thanks. Deb

OWLQN

2014-07-18 Thread Debasish Das
Hi, I thought OWLQN is already merged to mllib optimization but I don't see it in the master yet... Are there any issues in merging it in ? I see there are some merge conflicts right now... https://github.com/apache/spark/pull/840/ Thanks. Deb

Master compilation with sbt

2014-07-19 Thread Debasish Das
Hi, Is sbt still used for master compilation ? I could compile for 2.3.0-cdh5.0.2 using maven following the instructions from the website: http://spark.apache.org/docs/latest/building-with-maven.html But when I am trying to use sbt for local testing and then I am getting some weird errors...Is

Re: Master compilation with sbt

2014-07-20 Thread Debasish Das
On Sat, Jul 19, 2014 at 12:50 PM, Mark Hamstra m...@clearstorydata.com wrote: project mllib . . . clean . . . compile . . . test ...all works fine for me @2a732110d46712c535b75dd4f5a73761b6463aa8 On Sat, Jul 19, 2014 at 11:10 AM, Debasish Das

Re: Using mllib-1.1.0-SNAPSHOT on Spark 1.0.1

2014-08-05 Thread Debasish Das
, there might be bugs in it... Any suggestions will be appreciated Thanks. Deb On Sat, Aug 2, 2014 at 11:12 AM, Xiangrui Meng men...@gmail.com wrote: Yes, that should work. spark-mllib-1.1.0 should be compatible with spark-core-1.0.1. On Sat, Aug 2, 2014 at 10:54 AM, Debasish Das debasish.da

Re: Using mllib-1.1.0-SNAPSHOT on Spark 1.0.1

2014-08-06 Thread Debasish Das
) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) On Tue, Aug 5, 2014 at 5:59 PM, Debasish Das debasish.da...@gmail.com wrote: Hi Xiangrui, I used your idea and kept a cherry picked version

Re: Using mllib-1.1.0-SNAPSHOT on Spark 1.0.1

2014-08-06 Thread Debasish Das
they differ in the final recommendation? It would be great if you can test prec@k or ndcg@k metrics. Best, Xiangrui On Wed, Aug 6, 2014 at 8:28 AM, Debasish Das debasish.da...@gmail.com wrote: Hi Xiangrui, Maintaining another file will be a pain later so I deployed spark 1.0.1 without

Re: Using mllib-1.1.0-SNAPSHOT on Spark 1.0.1

2014-08-06 Thread Debasish Das
...@dbtsai.com wrote: One related question, is mllib jar independent from hadoop version (doesnt use hadoop api directly)? Can I use mllib jar compile for one version of hadoop and use it in another version of hadoop? Sent from my Google Nexus 5 On Aug 6, 2014 8:29 AM, Debasish Das debasish.da

Re: [SNAPSHOT] Snapshot1 of Spark 1.1.0 has been posted

2014-08-08 Thread Debasish Das
Hi Patrick, I am testing the 1.1 branch but I see lot of protobuf warnings while building the jars: [warn] Class com.google.protobuf.Parser not found - continuing with a stub. [warn] Class com.google.protobuf.Parser not found - continuing with a stub. [warn] Class com.google.protobuf.Parser

Re: Using mllib-1.1.0-SNAPSHOT on Spark 1.0.1

2014-08-09 Thread Debasish Das
with Java 1.7_55 but the cluster JRE is at 1.7_45. Thanks. Deb On Wed, Aug 6, 2014 at 12:01 PM, Debasish Das debasish.da...@gmail.com wrote: I did not play with Hadoop settings...everything is compiled with 2.3.0CDH5.0.2 for me... I did try to bump the version number of HBase from 0.94 to 0.96

Re: Using mllib-1.1.0-SNAPSHOT on Spark 1.0.1

2014-08-12 Thread Debasish Das
I figured out the issuethe driver memory was at 512 MB and for our datasets, the following code needed more memory... // Materialize usersOut and productsOut. usersOut.count() productsOut.count() Thanks. Deb On Sat, Aug 9, 2014 at 6:12 PM, Debasish Das debasish.da...@gmail.com wrote

Kryo serialization issues

2014-08-14 Thread Debasish Das
Hi, Is there a JIRA for this bug ? I have seen it multiple times during our ALS runs now...some runs don't show while some runs fail due to the error msg https://github.com/GrahamDennis/spark-kryo-serialisation/blob/master/README.md One way to circumvent this is to not use kryo but then I am

Re: [SPARK-2878] Kryo serialisation with custom Kryo registrator failing

2014-08-15 Thread Debasish Das
5:48 PM, Reynold Xin r...@databricks.com wrote: Here: https://github.com/apache/spark/pull/1948 On Thu, Aug 14, 2014 at 5:45 PM, Debasish Das debasish.da...@gmail.com wrote: Is there a fix that I can test ? I have the flows setup for both standalone and YARN runs... Thanks. Deb

Spark on YARN webui

2014-08-18 Thread Debasish Das
Hi, We are running the snapshots (new spark features) on YARN and I was wondering if the webui is available on YARN mode... The deployment document does not mention webui on YARN mode... Is it available ? Thanks. Deb

Re: [SPARK-2878] Kryo serialisation with custom Kryo registrator failing

2014-08-19 Thread Debasish Das
? @dbtsai did your assembly on YARN ran fine or you are still noticing these exceptions ? Thanks. Deb On Thu, Aug 14, 2014 at 5:48 PM, Reynold Xin r...@databricks.com wrote: Here: https://github.com/apache/spark/pull/1948 On Thu, Aug 14, 2014 at 5:45 PM, Debasish Das debasish.da

Lost executor on YARN ALS iterations

2014-08-19 Thread Debasish Das
Hi, During the 4th ALS iteration, I am noticing that one of the executor gets disconnected: 14/08/19 23:40:00 ERROR network.ConnectionManager: Corresponding SendingConnectionManagerId not found 14/08/19 23:40:00 INFO cluster.YarnClientSchedulerBackend: Executor 5 disconnected, so removing it

Re: Lost executor on YARN ALS iterations

2014-08-20 Thread Debasish Das
be the same issue as described in https://issues.apache.org/jira/browse/SPARK-2121 . We know that the container got killed by YARN because it used much more memory that it requested. But we haven't figured out the root cause yet. +Sandy Best, Xiangrui On Tue, Aug 19, 2014 at 8:51 PM, Debasish Das

Akka usage in Spark

2014-08-20 Thread Debasish Das
Hi, There have been some recent changes in the way akka is used in spark and I feel they are major changes... Is there a design document / JIRA / experiment on large datasets that highlight the impact of changes (1.0 vs 1.1) ? Basically it will be great to understand where akka is used in the

Re: Lost executor on YARN ALS iterations

2014-08-21 Thread Debasish Das
configuration, yarn.nodemanager.vmem-check-enabled is set to false. -Sandy On Wed, Aug 20, 2014 at 12:27 AM, Debasish Das debasish.da...@gmail.com wrote: I could reproduce the issue in both 1.0 and 1.1 using YARN...so this is definitely a YARN related problem... At least for me right now only

Re: Lost executor on YARN ALS iterations

2014-09-09 Thread Debasish Das
know that the container got killed by YARN because it used much more memory that it requested. But we haven't figured out the root cause yet. +Sandy Best, Xiangrui On Tue, Aug 19, 2014 at 8:51 PM, Debasish Das debasish.da...@gmail.com wrote: Hi, During the 4th ALS iteration, I am

Re: Lost executor on YARN ALS iterations

2014-09-09 Thread Debasish Das
. -Sandy On Tue, Sep 9, 2014 at 7:32 AM, Debasish Das debasish.da...@gmail.com wrote: Hi Sandy, Any resolution for YARN failures ? It's a blocker for running spark on top of YARN. Thanks. Deb On Tue, Aug 19, 2014 at 11:29 PM, Xiangrui Meng men...@gmail.com wrote: Hi Deb, I think this may

Re: Lost executor on YARN ALS iterations

2014-09-09 Thread Debasish Das
executors (unless ALS is using a bunch of off-heap memory?). You mentioned earlier in this thread that the property wasn't showing up in the Environment tab. Are you sure it's making it in? -Sandy On Tue, Sep 9, 2014 at 11:58 AM, Debasish Das debasish.da...@gmail.com wrote: Hmm...I did try

Re: I want to contribute MLlib two quality measures(ARHR and HR) for top N recommendation system. Is this meaningful?

2014-09-19 Thread Debasish Das
Hi Xiangrui, Could you please point to some reference for calculating prec@k and ndcg@k ? prec is precision I suppose but ndcg I have no idea about... Thanks. Deb On Mon, Aug 25, 2014 at 12:28 PM, Xiangrui Meng men...@gmail.com wrote: The evaluation metrics are definitely useful. How do

Re: I want to contribute MLlib two quality measures(ARHR and HR) for top N recommendation system. Is this meaningful?

2014-09-19 Thread Debasish Das
Thanks Christoph. Are these numbers for mllib als implicit and explicit feedback on movielens/netflix datasets documented on JIRA ? On Sep 19, 2014 1:16 PM, Christoph Sawade christoph.saw...@googlemail.com wrote: Hey Deb, NDCG is the Normalized Discounted Cumulative Gain [1]. Another

Re: Hyper Parameter Optimization Algorithms

2014-09-29 Thread Debasish Das
You should look into Evan Spark's talk from Spark Summit 2014 http://spark-summit.org/2014/talk/model-search-at-scale I am not sure if some of it is already open sourced through MLBase... On Mon, Sep 29, 2014 at 7:45 PM, Lochana Menikarachchi locha...@gmail.com wrote: Hi, Is there anyone

Cluster tests failing

2014-09-30 Thread Debasish Das
Hi, Inside mllib I am running tests using: mvn -Dhadoop.version=2.3.0-cdh5.1.0 -Phadoop-2.3 -Pyarn install The locat tests run fine but cluster tests are failing.. LBFGSClusterSuite: - task size should be small *** FAILED *** org.apache.spark.SparkException: Job aborted due to stage

Re: Cluster tests failing

2014-09-30 Thread Debasish Das
I have done mvn clean several times... Consistently all the mllib tests that are using LocalClusterSparkContext.scala, they fail !

Local tests logging to log4j

2014-10-07 Thread Debasish Das
Hi, I have added some changes to ALS tests and I am re-running tests as: mvn -Dhadoop.version=2.3.0-cdh5.1.0 -Phadoop-2.3 -Pyarn -DwildcardSuites=org.apache.spark.mllib.recommendation.ALSSuite test I have some INFO logs in the code which I want to see on my console. They work fine if I add

Re: Local tests logging to log4j

2014-10-07 Thread Debasish Das
=ERROR log4j.logger.org.apache.zookeeper=WARN log4j.logger.org.eclipse.jetty=WARN log4j.logger.org.I0Itec.zkclient=WARN On Tue, Oct 7, 2014 at 7:42 PM, Debasish Das debasish.da...@gmail.com wrote: Hi, I have added some changes to ALS tests and I am re-running tests as: mvn

Issues with ALS positive definite

2014-10-15 Thread Debasish Das
Hi, If I take the Movielens data and run the default ALS with regularization as 0.0, I am hitting exception from LAPACK that the gram matrix is not positive definite. This is on the master branch. This is how I run it : ./bin/spark-submit --total-executor-cores 1 --master spark://

Re: Issues with ALS positive definite

2014-10-15 Thread Debasish Das
, 2014 at 5:01 PM, Liquan Pei liquan...@gmail.com wrote: Hi Debaish, I think ||r - wi'hj||^{2} is semi-positive definite. Thanks, Liquan On Wed, Oct 15, 2014 at 4:57 PM, Debasish Das debasish.da...@gmail.com wrote: Hi, If I take the Movielens data and run the default ALS with regularization

Re: Issues with ALS positive definite

2014-10-16 Thread Debasish Das
in a different implementation and it has worked fine. Now I have to go hunt for how the QR decomposition is exposed in BLAS... Looks like its GEQRF which JBLAS helpfully exposes. Debasish you could try it for fun at least. On Oct 15, 2014 8:06 PM, Debasish Das debasish.da...@gmail.com wrote: But do

Re: Issues with ALS positive definite

2014-10-16 Thread Debasish Das
Just checked, QR is exposed by netlib: import org.netlib.lapack.Dgeqrf For the equality and bound version, I will use QR...it will be faster than the LU that I am using through jblas.solveSymmetric... On Thu, Oct 16, 2014 at 8:34 AM, Debasish Das debasish.da...@gmail.com wrote: @xiangrui

NNLS bug

2014-10-17 Thread Debasish Das
Hi, I am validating the proximal algorithm for positive and bound constrained ALS and I came across the bug detailed in the JIRA while running ALS with NNLS: https://issues.apache.org/jira/browse/SPARK-3987 ADMM based proximal algorithm came up with correct result... Thanks. Deb

Re: Oryx + Spark mllib

2014-10-19 Thread Debasish Das
wrote: Oryx 2 seems to be geared for Spark https://github.com/OryxProject/oryx 2014-10-18 11:46 GMT-04:00 Debasish Das debasish.da...@gmail.com: Hi, Is someone working on a project on integrating Oryx model serving layer with Spark ? Models will be built using either

matrix factorization cross validation

2014-10-29 Thread Debasish Das
Hi, In the current factorization flow, we cross validate on the test dataset using the RMSE number but there are some other measures which are worth looking into. If we consider the problem as a regression problem and the ratings 1-5 are considered as 5 classes, it is possible to generate a

Re: matrix factorization cross validation

2014-10-29 Thread Debasish Das
, Debasish Das debasish.da...@gmail.com wrote: Hi, In the current factorization flow, we cross validate on the test dataset using the RMSE number but there are some other measures which are worth looking into. If we consider the problem as a regression problem and the ratings 1-5

Re: matrix factorization cross validation

2014-10-29 Thread Debasish Das
to examples.MovielensALS. ROC should be good to add as well. -Xiangrui On Wed, Oct 29, 2014 at 11:23 AM, Debasish Das debasish.da...@gmail.com wrote: Hi, In the current factorization flow, we cross validate on the test dataset using the RMSE number but there are some other measures which are worth

Re: matrix factorization cross validation

2014-10-30 Thread Debasish Das
wonder if it is possible to extend the DIMSUM idea to computing top K matrix multiply between the user and item factor matrices, as opposed to all-pairs similarity of one matrix? On Thu, Oct 30, 2014 at 5:28 AM, Debasish Das debasish.da...@gmail.com wrote: Is there an example of how to use

Re: matrix factorization cross validation

2014-10-30 Thread Debasish Das
any of the topic modeling algorithms as well... Is there a better place for it other than mllib examples ? On Thu, Oct 30, 2014 at 8:13 AM, Debasish Das debasish.da...@gmail.com wrote: I thought topK will save us...for each user we have 1xrank...now our movie factor is a RDD...we pick topK movie

Re: matrix factorization cross validation

2014-11-03 Thread Debasish Das
:24 PM, Sean Owen so...@cloudera.com wrote: MAP is effectively an average over all k from 1 to min(# recommendations, # items rated) Getting first recommendations right is more important than the last. On Thu, Oct 30, 2014 at 10:21 PM, Debasish Das debasish.da...@gmail.com wrote

MatrixFactorizationModel predict(Int, Int) API

2014-11-03 Thread Debasish Das
Hi, I am testing MatrixFactorizationModel.predict(user: Int, product: Int) but the code fails on userFeatures.lookup(user).head In computeRmse MatrixFactorizationModel.predict(RDD[(Int, Int)]) has been called and in all the test-cases that API has been used... I can perhaps refactor my code to

Re: [VOTE] Designating maintainers for some Spark components

2014-11-06 Thread Debasish Das
+1 The app to track PRs based on component is a great idea... On Thu, Nov 6, 2014 at 8:47 AM, Sean McNamara sean.mcnam...@webtrends.com wrote: +1 Sean On Nov 5, 2014, at 6:32 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Hi all, I wanted to share a discussion we've been having on

Re: MatrixFactorizationModel predict(Int, Int) API

2014-11-06 Thread Debasish Das
userFeatures.lookup(user).head to work ? On Mon, Nov 3, 2014 at 9:24 PM, Xiangrui Meng men...@gmail.com wrote: Was user presented in training? We can put a check there and return NaN if the user is not included in the model. -Xiangrui On Mon, Nov 3, 2014 at 5:25 PM, Debasish Das debasish.da

Re: MatrixFactorizationModel predict(Int, Int) API

2014-11-06 Thread Debasish Das
if the user is not included in the model. -Xiangrui On Mon, Nov 3, 2014 at 5:25 PM, Debasish Das debasish.da...@gmail.com wrote: Hi, I am testing MatrixFactorizationModel.predict(user: Int, product: Int) but the code fails on userFeatures.lookup(user).head In computeRmse

Re: MatrixFactorizationModel predict(Int, Int) API

2014-11-10 Thread Debasish Das
/SPARK-3066 The easiest case is when one side is small. If both sides are large, this is a super-expensive operation. We can do block-wise cross product and then find top-k for each user. Best, Xiangrui On Thu, Nov 6, 2014 at 4:51 PM, Debasish Das debasish.da...@gmail.com wrote

TimSort in 1.2

2014-11-13 Thread Debasish Das
Hi, I am noticing the first step for Spark jobs does a TimSort in 1.2 branch...and there is some time spent doing the TimSort...Is this assigning the RDD blocks to different nodes based on a sort order ? Could someone please point to a JIRA about this change so that I can read more about it ?

  1   2   >