Hi,
We have a mirror repo of spark at our internal stash.
We are adding changes to a fork of the mirror so that down the line we can
push the contributions back to Spark git.
I am not sure what's the exact the development methodology we should follow
as things are a bit complicated due to
what you mean by enterprise stash.
But PR is a concept unique to Github. There is no PR model in normal git or
the git ASF maintains.
On Sat, Mar 1, 2014 at 11:28 AM, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
We have a mirror repo of spark at our internal stash.
We are adding
has to come through github ?
I could merge for example @dbtsai github lbfgs branch to my branch at
stash...
Thanks.
Deb
On Sat, Mar 1, 2014 at 12:43 PM, Debasish Das debasish.da...@gmail.comwrote:
Stash is an enterprise git from atlassian..
I got it...Basically the PRs are managed by github
infrastructure
around this?
Thanks.
Sincerely,
DB Tsai
Machine Learning Engineer
Alpine Data Labs
--
Web: http://alpinenow.com/
On Sun, Mar 2, 2014 at 10:23 AM, Debasish Das debasish.da...@gmail.com
wrote:
Hi DB,
1. Could you point to the BFGS
Hi David,
Few questions on breeze solvers:
1. I feel the right place of adding useful things from RISO LBFGS (based on
Professor Nocedal's fortran code) will be breeze. It will involve stress
testing breeze LBFGS on large sparse datasets and contributing fixes to
existing breeze LBFGS with the
Hi,
I am running ALS on a sparse problem (10M x 1M) and I am getting the
following error:
org.jblas.exceptions.LapackArgumentException: LAPACK DPOSV: Leading minor
of order i of A is not positive definite.
at org.jblas.SimpleBlas.posv(SimpleBlas.java:373)
at
definite. Therefore, we chose QR decomposition to solve the linear system.
--sebastian
On 03/06/2014 03:44 PM, Debasish Das wrote:
Hi,
I am running ALS on a sparse problem (10M x 1M) and I am getting the
following error:
org.jblas.exceptions.LapackArgumentException: LAPACK DPOSV: Leading
something there.)
Even though your data is huge, if it was generated by some synthetic
process, maybe it is very low rank?
QR decomposition is pretty good here, yes.
--
Sean Owen | Director, Data Science | London
On Thu, Mar 6, 2014 at 3:05 PM, Debasish Das debasish.da...@gmail.com
Hi Xiangrui,
I used lambda = 0.1...It is possible that 2 users ranked in movies in a
very similar way...
I agree that increasing lambda will solve the problem but you agree this is
not a solution...lambda should be tuned based on sparsity / other criteria
and not to make a linearly dependent
.
Thanks.
Deb
On Wed, Mar 19, 2014 at 10:11 AM, Xiangrui Meng men...@gmail.com wrote:
Another question: do you have negative or out-of-range user or product
ids or? -Xiangrui
On Tue, Mar 11, 2014 at 8:00 PM, Debasish Das debasish.da...@gmail.com
wrote:
Nope..I did not test implicit feedback
Awesome news !
It will be great if there are any examples or usecases to look at ?
We are looking into shark/ooyala job server to give in memory sql
analytics, model serving/scoring features for dashboard apps...
Does this feature has different usecases than shark or more cleaner as hive
.
On Wed, Mar 26, 2014 at 6:06 AM, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
For our usecases we are looking into 20 x 1M matrices which comes in the
similar ranges as outlined by the paper over here:
http://sandeeptata.blogspot.com/2012/12/sparkler-large-scale-matrix.html
Hi Matei,
I am hitting similar problems with 10 ALS iterations...I am running with 24
gb executor memory on 10 nodes for 20M x 3 M matrix with rank =50
The first iteration of flatMaps run fine which means that the memory
requirements are good per iteration...
If I do check-pointing on RDD, most
:01 PM, Debasish Das debasish.da...@gmail.com
wrote:
Hi David,
I have started to experiment with BFGS solvers for Spark GLM over large
scale data...
I am also looking to add a good QP solver in breeze that can be used in
Spark ALS for constraint solves...More details on that soon
Thanks Patrick...I searched in the archives and found the answer...tuning
the akka and gc params
On Fri, Apr 4, 2014 at 10:35 PM, Patrick Wendell pwend...@gmail.com wrote:
I answered this over on the user list...
On Fri, Apr 4, 2014 at 6:13 PM, Debasish Das debasish.da...@gmail.com
I am synced with apache/spark master but getting error in spark/sql
compilation...
Is the master broken ?
[info] Compiling 34 Scala sources to
/home/debasish/spark_deploy/sql/core/target/scala-2.10/classes...
[error]
open a hot-fix PR after looking for other stuff like this that
might have snuck in.
--
Sean Owen | Director, Data Science | London
On Sat, Apr 5, 2014 at 10:04 PM, Debasish Das debasish.da...@gmail.com
wrote:
I am synced with apache/spark master but getting error in spark/sql
compilation
not sure why it didn't fail our build...
On Sat, Apr 5, 2014 at 2:30 PM, Debasish Das debasish.da...@gmail.com
wrote:
I verified this is happening for both CDH4.5 and 1.0.4...My deploy
environment is Java 6...so Java 7 compilation is not going to help...
Is this the PR which caused
to submit a hot fix for this issue specifically please do.
I'm
not sure why it didn't fail our build...
On Sat, Apr 5, 2014 at 2:30 PM, Debasish Das debasish.da...@gmail.com
wrote:
I verified this is happening for both CDH4.5 and 1.0.4...My deploy
environment is Java 6...so Java 7
it with java 6 just fine.
something compiled with java7 with -target 1.7 will not run on java 6
On Sat, Apr 5, 2014 at 9:10 PM, Debasish Das debasish.da...@gmail.com
wrote:
With jdk7 I could compile it fine:
java version 1.7.0_51
Java(TM) SE Runtime Environment (build 1.7.0_51-b13
in Breeze of just the same form
we just saw. It's worth opening an issue since, indeed, I would expect
exactly the compile error you see with Java 6.
But it should not stop you from building Spark.
On Sun, Apr 6, 2014 at 5:00 PM, Debasish Das debasish.da...@gmail.com
wrote:
Hi Koert,
How
at 3:20 PM, Debasish Das debasish.da...@gmail.comwrote:
Hi Matei,
I am hitting similar problems with 10 ALS iterations...I am running with
24 gb executor memory on 10 nodes for 20M x 3 M matrix with rank =50
The first iteration of flatMaps run fine which means that the memory
requirements
you can
reproduce the error on a public data set, e.g., movielens? Thanks!
Best,
Xiangrui
On Sat, Apr 5, 2014 at 10:53 PM, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
I deployed apache/spark master today and recently there were many ALS
related checkins and enhancements..
I am
, Apr 5, 2014 at 10:53 PM, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
I deployed apache/spark master today and recently there were many ALS
related checkins and enhancements..
I am running ALS with explicit feedback and I remember most
enhancements
were related to implicit
commit?
2) Do you have negative or out-of-integer-range user or product ids?
Try to print out the max/min value of user/product ids.
Best,
Xiangrui
On Sun, Apr 6, 2014 at 11:01 PM, Debasish Das debasish.da...@gmail.com
wrote:
Hi Xiangrui,
With 4 ALS iterations it runs fine...If I run 10
I got your checkinI need to run logistic regression SGD vs BFGS for my
current usecases but your next checkin will update the logistic regression
with LBFGS right ? Are you adding it to regression package as well ?
Thanks.
Deb
On Mon, Apr 7, 2014 at 7:00 PM, DB Tsai dbt...@stanford.edu
By the way...what's the idea...the labeled data set is a RDD which is
cached on all nodes..
The bfgs solver is maintained on the master or each worker is supposed to
maintain it's own bfgs...
On Mon, Apr 7, 2014 at 11:23 PM, Debasish Das debasish.da...@gmail.comwrote:
I got your checkinI
to L-BFGS.
Sincerely,
DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
On Tue, Apr 8, 2014 at 9:42 AM, Debasish Das debasish.da...@gmail.com
wrote:
Hi DB,
Are we going to clean up the function
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
On Tue, Apr 8, 2014 at 4:05 PM, Debasish Das debasish.da...@gmail.com
wrote:
Yup that's what I expected...L-BFGS solver is in the master and gradient
computation per RDD is done on each
Hi,
I saw in the code that spark jars are published on sonatype but I was
wondering if you guys have published spark jars to artifactory
as...Cloudera uses artifactory...
Somehow I can publish maven projects to artifactory but after following the
sbt link:
Hi,
Why mllib vector is using double as default ?
/**
* Represents a numeric vector, whose index type is Int and value type is
Double.
*/
trait Vector extends Serializable {
/**
* Size of the vector.
*/
def size: Int
/**
* Converts the instance to a double array.
.
Sincerely,
DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
On Mon, May 5, 2014 at 2:41 PM, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
Why mllib vector is using double as default
://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
On Mon, May 5, 2014 at 2:56 PM, Debasish Das debasish.da...@gmail.com
wrote:
Is this a breeze issue or breeze can take templates on float / double ?
If breeze can take templates then it is a minor fix for Vectors.scala
Hi,
I need to change the toString on LabeledPoint to libsvm format so that I
can dump RDD[LabeledPoint] as a format that could be read by sparse
glmnet-R and other packages to benchmark mllib classification accuracy...
Basically I have to change the toString of LabeledPoint and toString of
,
DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
On Mon, May 5, 2014 at 2:56 PM, Debasish Das
debasish.da...@gmail.com
wrote:
Is this a breeze issue or breeze can
) in the
matrix
template.
Sincerely,
DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
On Mon, May 5, 2014 at 2:56 PM, Debasish Das
debasish.da...@gmail.com
wrote
Hi Patrick,
We maintain internal Spark mirror in sync with Spark github master...
What's the way to get the 1.0.0 stable release from github to deploy on our
production cluster ? Is there a tag for 1.0.0 that I should use to deploy ?
Thanks.
Deb
On Wed, Jun 4, 2014 at 10:49 AM, Patrick
Hi,
We are adding a constrained ALS solver in Spark to solve matrix
factorization use-cases which needs additional constraints (bounds,
equality, inequality, quadratic constraints)
We are using a native version of a primal dual SOCP solver due to its small
memory footprint and sparse ccs matrix
. You can define an interface for the subproblem solvers and
maintain the IPM solver at your own code base, if the only information
you need is Y^T Y and Y^T b.
Btw, just curious, what is the use case for quadratic constraints?
Best,
Xiangrui
On Thu, Jun 5, 2014 at 3:38 PM, Debasish Das debasish.da
include it in the classpath. Creating two separate files still seems
unnecessary to me. Could you create a JIRA and we can move our
discussion there? Thanks!
Best,
Xiangrui
On Thu, Jun 5, 2014 at 7:20 PM, Debasish Das debasish.da...@gmail.com
wrote:
Hi Xiangrui,
For orthogonality
= Solve.solvePositive(fullXtX, userXy
(index)).data
}
}
On Tue, Jun 10, 2014 at 8:56 PM, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
I am bit confused wiht the code here:
// Solve the least-squares problem for each user and return the new
feature vectors
Array.range(0
Look into Powered by Spark page...I found a project there which used
autoencoder functions...It's not updated for a long time now !
On Thu, Jun 26, 2014 at 10:51 PM, Ulanov, Alexander alexander.ula...@hp.com
wrote:
Hi Bert,
It would be extremely interesting. Do you plan to implement
Hi,
In my experiments with Jellyfish I did not see any substantial RMSE loss
over DSGD for Netflix dataset...
So we decided to stick with ALS and implemented a family of Quadratic
Minimization solvers that stays in the ALS realm but can solve interesting
constraints(positivity, bounds, L1,
Hi,
I am looking for an efficient linear CG to be put inside the Quadratic
Minimization algorithms we added for Spark mllib.
With a good linear CG, we should be able to solve kernel SVMs with this
solver in mllib...
I use direct solves right now using cholesky decomposition which has higher
targeted for
norm-constrained solutions of the CG problem.
On Fri, Jun 27, 2014 at 5:54 PM, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
I am looking for an efficient linear CG to be put inside the Quadratic
Minimization algorithms we added for Spark mllib.
With a good linear CG
, Jun 27, 2014 at 12:47 PM, Debasish Das [via Apache Spark Developers
List] ml-node+s1001551n7098...@n3.nabble.com wrote:
Hi,
In my experiments with Jellyfish I did not see any substantial RMSE loss
over DSGD for Netflix dataset...
So we decided to stick with ALS and implemented a family
for
bound-constrained CG, though bounded LBFGS is more common. I think code
for Nystrom approximations or kernel mappings would be more useful.
On Fri, Jun 27, 2014 at 5:42 PM, Debasish Das debasish.da...@gmail.com
wrote:
Thanks David...Let me try it...I am keen to see the results first
think the numeric
support is lacking in Java land.
On Sat, Jun 28, 2014 at 1:47 PM, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
I am coming up with an iterative solver for Equality and bound
constrained
quadratic minimization...
I have the cholesky versions running
3 outputs) functions.
If there is interest, I would be happy to release the code. What would be
the best way to do this? Is there some kind of review process?
Best regards,
Bert
-Original Message-
From: Debasish Das [mailto:debasish.da...@gmail.com]
Sent: 27 June 2014 14:02
the
subproblems no longer decoupled, that would certainly affects
scalability. -Xiangrui
On Wed, Jun 11, 2014 at 2:20 AM, Debasish Das debasish.da...@gmail.com
wrote:
I got it...ALS formulation is solving the matrix completion problem
To convert the problem to matrix factorization or take
or the objective function is
complex but splittable. Neither applies to this case.
Best,
Xiangrui
On Tue, Jul 1, 2014 at 11:05 PM, Debasish Das debasish.da...@gmail.com
wrote:
Hi Xiangrui,
Could you please point to the IPM solver that you have positive results
with ? I was planning
Hi Denis,
Are you using matrix factorization to generate the latent factors ?
Thanks.
Deb
On Thu, Jul 3, 2014 at 8:49 AM, Denis Turdakov turda...@ispras.ru wrote:
Hello guys,
We made pull request with PLSA and its modifications:
- https://github.com/apache/spark/pull/1269
- JIRA issue
Thanks for the pointer...
Looks like you are using EM algorithm for factorization which looks similar
to multiplicative update rules
Do you think using mllib ALS implicit feedback, you can scale the problem
further ?
We can handle L1, L2, equality and positivity constraints in ALS now...As
long
I looked further and realized that ECOS used a mex file while PDCO is using
pure Matlab code. So the out-of-box runtime comparison is not fair.
I am trying to generate PDCO C port. Like ECOS, PDCO also makes use of
sparse support from Tim Davis.
Thanks.
Deb
Hi,
I thought OWLQN is already merged to mllib optimization but I don't see it
in the master yet...
Are there any issues in merging it in ? I see there are some merge
conflicts right now...
https://github.com/apache/spark/pull/840/
Thanks.
Deb
Hi,
Is sbt still used for master compilation ? I could compile for
2.3.0-cdh5.0.2 using maven following the instructions from the website:
http://spark.apache.org/docs/latest/building-with-maven.html
But when I am trying to use sbt for local testing and then I am getting
some weird errors...Is
On Sat, Jul 19, 2014 at 12:50 PM, Mark Hamstra m...@clearstorydata.com
wrote:
project mllib
.
.
.
clean
.
.
.
compile
.
.
.
test
...all works fine for me @2a732110d46712c535b75dd4f5a73761b6463aa8
On Sat, Jul 19, 2014 at 11:10 AM, Debasish Das
, there might be bugs in it...
Any suggestions will be appreciated
Thanks.
Deb
On Sat, Aug 2, 2014 at 11:12 AM, Xiangrui Meng men...@gmail.com wrote:
Yes, that should work. spark-mllib-1.1.0 should be compatible with
spark-core-1.0.1.
On Sat, Aug 2, 2014 at 10:54 AM, Debasish Das debasish.da
)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
On Tue, Aug 5, 2014 at 5:59 PM, Debasish Das debasish.da...@gmail.com
wrote:
Hi Xiangrui,
I used your idea and kept a cherry picked version
they differ in the final recommendation? It would be great if you can
test prec@k or ndcg@k metrics.
Best,
Xiangrui
On Wed, Aug 6, 2014 at 8:28 AM, Debasish Das debasish.da...@gmail.com
wrote:
Hi Xiangrui,
Maintaining another file will be a pain later so I deployed spark 1.0.1
without
...@dbtsai.com wrote:
One related question, is mllib jar independent from hadoop version (doesnt
use hadoop api directly)? Can I use mllib jar compile for one version of
hadoop and use it in another version of hadoop?
Sent from my Google Nexus 5
On Aug 6, 2014 8:29 AM, Debasish Das debasish.da
Hi Patrick,
I am testing the 1.1 branch but I see lot of protobuf warnings while
building the jars:
[warn] Class com.google.protobuf.Parser not found - continuing with a stub.
[warn] Class com.google.protobuf.Parser not found - continuing with a stub.
[warn] Class com.google.protobuf.Parser
with Java 1.7_55 but
the cluster JRE is at 1.7_45.
Thanks.
Deb
On Wed, Aug 6, 2014 at 12:01 PM, Debasish Das debasish.da...@gmail.com
wrote:
I did not play with Hadoop settings...everything is compiled with
2.3.0CDH5.0.2 for me...
I did try to bump the version number of HBase from 0.94 to 0.96
I figured out the issuethe driver memory was at 512 MB and for our
datasets, the following code needed more memory...
// Materialize usersOut and productsOut.
usersOut.count()
productsOut.count()
Thanks.
Deb
On Sat, Aug 9, 2014 at 6:12 PM, Debasish Das debasish.da...@gmail.com
wrote
Hi,
Is there a JIRA for this bug ?
I have seen it multiple times during our ALS runs now...some runs don't
show while some runs fail due to the error msg
https://github.com/GrahamDennis/spark-kryo-serialisation/blob/master/README.md
One way to circumvent this is to not use kryo but then I am
5:48 PM, Reynold Xin r...@databricks.com wrote:
Here: https://github.com/apache/spark/pull/1948
On Thu, Aug 14, 2014 at 5:45 PM, Debasish Das debasish.da...@gmail.com
wrote:
Is there a fix that I can test ? I have the flows setup for both
standalone and YARN runs...
Thanks.
Deb
Hi,
We are running the snapshots (new spark features) on YARN and I was
wondering if the webui is available on YARN mode...
The deployment document does not mention webui on YARN mode...
Is it available ?
Thanks.
Deb
?
@dbtsai did your assembly on YARN ran fine or you are still noticing these
exceptions ?
Thanks.
Deb
On Thu, Aug 14, 2014 at 5:48 PM, Reynold Xin r...@databricks.com wrote:
Here: https://github.com/apache/spark/pull/1948
On Thu, Aug 14, 2014 at 5:45 PM, Debasish Das debasish.da
Hi,
During the 4th ALS iteration, I am noticing that one of the executor gets
disconnected:
14/08/19 23:40:00 ERROR network.ConnectionManager: Corresponding
SendingConnectionManagerId not found
14/08/19 23:40:00 INFO cluster.YarnClientSchedulerBackend: Executor 5
disconnected, so removing it
be the same issue as described in
https://issues.apache.org/jira/browse/SPARK-2121 . We know that the
container got killed by YARN because it used much more memory that it
requested. But we haven't figured out the root cause yet.
+Sandy
Best,
Xiangrui
On Tue, Aug 19, 2014 at 8:51 PM, Debasish Das
Hi,
There have been some recent changes in the way akka is used in spark and I
feel they are major changes...
Is there a design document / JIRA / experiment on large datasets that
highlight the impact of changes (1.0 vs 1.1) ? Basically it will be great
to understand where akka is used in the
configuration, yarn.nodemanager.vmem-check-enabled is set to false.
-Sandy
On Wed, Aug 20, 2014 at 12:27 AM, Debasish Das debasish.da...@gmail.com
wrote:
I could reproduce the issue in both 1.0 and 1.1 using YARN...so this is
definitely a YARN related problem...
At least for me right now only
know that the
container got killed by YARN because it used much more memory that it
requested. But we haven't figured out the root cause yet.
+Sandy
Best,
Xiangrui
On Tue, Aug 19, 2014 at 8:51 PM, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
During the 4th ALS iteration, I am
.
-Sandy
On Tue, Sep 9, 2014 at 7:32 AM, Debasish Das debasish.da...@gmail.com
wrote:
Hi Sandy,
Any resolution for YARN failures ? It's a blocker for running spark on
top of YARN.
Thanks.
Deb
On Tue, Aug 19, 2014 at 11:29 PM, Xiangrui Meng men...@gmail.com wrote:
Hi Deb,
I think this may
executors (unless ALS is using a bunch of off-heap memory?). You mentioned
earlier in this thread that the property wasn't showing up in the
Environment tab. Are you sure it's making it in?
-Sandy
On Tue, Sep 9, 2014 at 11:58 AM, Debasish Das debasish.da...@gmail.com
wrote:
Hmm...I did try
Hi Xiangrui,
Could you please point to some reference for calculating prec@k and ndcg@k ?
prec is precision I suppose but ndcg I have no idea about...
Thanks.
Deb
On Mon, Aug 25, 2014 at 12:28 PM, Xiangrui Meng men...@gmail.com wrote:
The evaluation metrics are definitely useful. How do
Thanks Christoph.
Are these numbers for mllib als implicit and explicit feedback on
movielens/netflix datasets documented on JIRA ?
On Sep 19, 2014 1:16 PM, Christoph Sawade
christoph.saw...@googlemail.com wrote:
Hey Deb,
NDCG is the Normalized Discounted Cumulative Gain [1]. Another
You should look into Evan Spark's talk from Spark Summit 2014
http://spark-summit.org/2014/talk/model-search-at-scale
I am not sure if some of it is already open sourced through MLBase...
On Mon, Sep 29, 2014 at 7:45 PM, Lochana Menikarachchi locha...@gmail.com
wrote:
Hi,
Is there anyone
Hi,
Inside mllib I am running tests using:
mvn -Dhadoop.version=2.3.0-cdh5.1.0 -Phadoop-2.3 -Pyarn install
The locat tests run fine but cluster tests are failing..
LBFGSClusterSuite:
- task size should be small *** FAILED ***
org.apache.spark.SparkException: Job aborted due to stage
I have done mvn clean several times...
Consistently all the mllib tests that are using
LocalClusterSparkContext.scala, they fail !
Hi,
I have added some changes to ALS tests and I am re-running tests as:
mvn -Dhadoop.version=2.3.0-cdh5.1.0 -Phadoop-2.3 -Pyarn
-DwildcardSuites=org.apache.spark.mllib.recommendation.ALSSuite test
I have some INFO logs in the code which I want to see on my console. They
work fine if I add
=ERROR
log4j.logger.org.apache.zookeeper=WARN
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.I0Itec.zkclient=WARN
On Tue, Oct 7, 2014 at 7:42 PM, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
I have added some changes to ALS tests and I am re-running tests as:
mvn
Hi,
If I take the Movielens data and run the default ALS with regularization as
0.0, I am hitting exception from LAPACK that the gram matrix is not
positive definite. This is on the master branch.
This is how I run it :
./bin/spark-submit --total-executor-cores 1 --master spark://
, 2014 at 5:01 PM, Liquan Pei liquan...@gmail.com wrote:
Hi Debaish,
I think ||r - wi'hj||^{2} is semi-positive definite.
Thanks,
Liquan
On Wed, Oct 15, 2014 at 4:57 PM, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
If I take the Movielens data and run the default ALS with regularization
in a different implementation and it
has worked fine.
Now I have to go hunt for how the QR decomposition is exposed in BLAS...
Looks like its GEQRF which JBLAS helpfully exposes. Debasish you could try
it for fun at least.
On Oct 15, 2014 8:06 PM, Debasish Das debasish.da...@gmail.com wrote:
But do
Just checked, QR is exposed by netlib: import org.netlib.lapack.Dgeqrf
For the equality and bound version, I will use QR...it will be faster than
the LU that I am using through jblas.solveSymmetric...
On Thu, Oct 16, 2014 at 8:34 AM, Debasish Das debasish.da...@gmail.com
wrote:
@xiangrui
Hi,
I am validating the proximal algorithm for positive and bound constrained
ALS and I came across the bug detailed in the JIRA while running ALS with
NNLS:
https://issues.apache.org/jira/browse/SPARK-3987
ADMM based proximal algorithm came up with correct result...
Thanks.
Deb
wrote:
Oryx 2 seems to be geared for Spark
https://github.com/OryxProject/oryx
2014-10-18 11:46 GMT-04:00 Debasish Das debasish.da...@gmail.com:
Hi,
Is someone working on a project on integrating Oryx model serving
layer
with Spark ? Models will be built using either
Hi,
In the current factorization flow, we cross validate on the test dataset
using the RMSE number but there are some other measures which are worth
looking into.
If we consider the problem as a regression problem and the ratings 1-5 are
considered as 5 classes, it is possible to generate a
, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
In the current factorization flow, we cross validate on the test dataset
using the RMSE number but there are some other measures which are worth
looking into.
If we consider the problem as a regression problem and the ratings 1-5
to examples.MovielensALS. ROC
should be good to add as well. -Xiangrui
On Wed, Oct 29, 2014 at 11:23 AM, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
In the current factorization flow, we cross validate on the test dataset
using the RMSE number but there are some other measures which are worth
wonder if it is possible to extend the DIMSUM idea to computing top K
matrix multiply between the user and item factor matrices, as opposed to
all-pairs similarity of one matrix?
On Thu, Oct 30, 2014 at 5:28 AM, Debasish Das debasish.da...@gmail.com
wrote:
Is there an example of how to use
any of the topic modeling
algorithms as well...
Is there a better place for it other than mllib examples ?
On Thu, Oct 30, 2014 at 8:13 AM, Debasish Das debasish.da...@gmail.com
wrote:
I thought topK will save us...for each user we have 1xrank...now our movie
factor is a RDD...we pick topK movie
:24 PM, Sean Owen so...@cloudera.com wrote:
MAP is effectively an average over all k from 1 to min(#
recommendations, # items rated) Getting first recommendations right is
more important than the last.
On Thu, Oct 30, 2014 at 10:21 PM, Debasish Das
debasish.da...@gmail.com
wrote
Hi,
I am testing MatrixFactorizationModel.predict(user: Int, product: Int) but
the code fails on userFeatures.lookup(user).head
In computeRmse MatrixFactorizationModel.predict(RDD[(Int, Int)]) has been
called and in all the test-cases that API has been used...
I can perhaps refactor my code to
+1
The app to track PRs based on component is a great idea...
On Thu, Nov 6, 2014 at 8:47 AM, Sean McNamara sean.mcnam...@webtrends.com
wrote:
+1
Sean
On Nov 5, 2014, at 6:32 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
Hi all,
I wanted to share a discussion we've been having on
userFeatures.lookup(user).head to
work ?
On Mon, Nov 3, 2014 at 9:24 PM, Xiangrui Meng men...@gmail.com wrote:
Was user presented in training? We can put a check there and return
NaN if the user is not included in the model. -Xiangrui
On Mon, Nov 3, 2014 at 5:25 PM, Debasish Das debasish.da
if the user is not included in the model. -Xiangrui
On Mon, Nov 3, 2014 at 5:25 PM, Debasish Das debasish.da...@gmail.com
wrote:
Hi,
I am testing MatrixFactorizationModel.predict(user: Int, product: Int)
but
the code fails on userFeatures.lookup(user).head
In computeRmse
/SPARK-3066
The easiest case is when one side is small. If both sides are large,
this is a super-expensive operation. We can do block-wise cross
product and then find top-k for each user.
Best,
Xiangrui
On Thu, Nov 6, 2014 at 4:51 PM, Debasish Das debasish.da...@gmail.com
wrote
Hi,
I am noticing the first step for Spark jobs does a TimSort in 1.2
branch...and there is some time spent doing the TimSort...Is this assigning
the RDD blocks to different nodes based on a sort order ?
Could someone please point to a JIRA about this change so that I can read
more about it ?
1 - 100 of 153 matches
Mail list logo