[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

2017-05-10 Thread debasish83
Github user debasish83 commented on a diff in the pull request: https://github.com/apache/spark/pull/17862#discussion_r115745818 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -154,22 +159,23 @@ class LinearSVCSuite extends

[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

2017-05-10 Thread debasish83
Github user debasish83 commented on a diff in the pull request: https://github.com/apache/spark/pull/17862#discussion_r115741206 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -154,22 +159,23 @@ class LinearSVCSuite extends

[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

2017-05-09 Thread debasish83
Github user debasish83 commented on a diff in the pull request: https://github.com/apache/spark/pull/17862#discussion_r115659479 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala --- @@ -154,22 +159,23 @@ class LinearSVCSuite extends

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

2017-05-06 Thread debasish83
Github user debasish83 commented on the issue: https://github.com/apache/spark/pull/17862 @hhbyyh can we smooth the hinge-loss using soft-max (variant of ReLU) and then use LBFGS ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #12574: [SPARK-13857][ML][WIP] Add "recommend all" functionality...

2016-12-27 Thread debasish83
Github user debasish83 commented on the issue: https://github.com/apache/spark/pull/12574 test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #14473: [SPARK-16495] [MLlib]Add ADMM optimizer in mllib package

2016-12-25 Thread debasish83
Github user debasish83 commented on the issue: https://github.com/apache/spark/pull/14473 ADMM is already available as a breeze solver (BFGS, OWLQN, NonlinearMinimizer) which is integrated with ml/mllib...It will be great if you can look into it and let me know if you need pointers

[GitHub] spark issue #12574: [SPARK-13857][ML][WIP] Add "recommend all" functionality...

2016-12-25 Thread debasish83
Github user debasish83 commented on the issue: https://github.com/apache/spark/pull/12574 Can we close it ? Looks like SPARK-18235 opened up recommendForAll --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #12574: [SPARK-13857][ML][WIP] Add "recommend all" functionality...

2016-12-25 Thread debasish83
Github user debasish83 commented on the issue: https://github.com/apache/spark/pull/12574 test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #12574: [SPARK-13857][ML][WIP] Add "recommend all" functionality...

2016-08-06 Thread debasish83
Github user debasish83 commented on the issue: https://github.com/apache/spark/pull/12574 I will take a pass at the PR as well.. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #12574: [SPARK-13857][ML][WIP] Add "recommend all" functionality...

2016-08-06 Thread debasish83
Github user debasish83 commented on the issue: https://github.com/apache/spark/pull/12574 @MLnick I recently visited IBM STC but unfortunately missed you on the meeting...we discussed about the ML/MLlib changes for matrix factorization... --- If your project is set up for it, you

[GitHub] spark issue #458: [SPARK-1543][MLlib] Add ADMM for solving Lasso (and elasti...

2016-06-29 Thread debasish83
Github user debasish83 commented on the issue: https://github.com/apache/spark/pull/458 ADMM is already implemented as part of Breeze proximal NonlinearMinimizer where the ADMM solver stays in master and gradient calculator is used in similar manner as how Breeze LBFGS/OWLQN has been

[GitHub] spark issue #1110: [SPARK-2174][MLLIB] treeReduce and treeAggregate

2016-06-05 Thread debasish83
Github user debasish83 commented on the issue: https://github.com/apache/spark/pull/1110 @mengxr say I have 20 nodes and 16 cores on each node, do you recommend running treeReduce with 320 partitions and OpenBLAS with numThreads=1 on each partition for SeqOp OR treeReduce with 20

[GitHub] spark pull request: [SPARK-4231][MLLIB][Examples] MAP calculation ...

2015-12-05 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/5869#issuecomment-162240882 @srowen actually I am not sure if MAP calculation got added in ML pipeline or not...I will look into it and if someone else already added it, I will close the PR

[GitHub] spark pull request: [WIP][MLLIB][SPARK-4675][SPARK-4823]RowSimilar...

2015-08-31 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/6213#issuecomment-136503426 @rezazadeh got busy with spark streaming version of KNN :-) I will open up 2 PRs over the weekend as we discussed. --- If your project is set up for it, you can

[GitHub] spark pull request: [MLLIB][WIP] SPARK-4638: Kernels feature for M...

2015-07-11 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/5503#issuecomment-120658511 @dbtsai @mandar2812 I found the abstraction for kernel as explained in my PR https://github.com/apache/spark/pull/6213 more generic in practical use-cases compared

[GitHub] spark pull request: [WIP][MLLIB][SPARK-4675][SPARK-4823]RowSimilar...

2015-06-06 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/6213#issuecomment-109654217 Internally we are using this code for euclidean/rbf driving PIC for example...but sure we can focus on cosine first... --- If your project is set up for it, you can

[GitHub] spark pull request: [WIP][MLLIB][SPARK-4675][SPARK-4823]RowSimilar...

2015-05-30 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/6213#issuecomment-107113056 @rezazadeh sure I will do thatCould you add a JIRA for 3 (Kernel Clustering / PIC) so that we can add RBFKernel flow and implement PIC with vector - matrix

[GitHub] spark pull request: [MLLIB][SPARK-4675] Find similar products and ...

2015-05-24 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3536#issuecomment-105026856 Let's continue the validation discussion on https://github.com/apache/spark/pull/6213. The PR introduces batch gemm based similarity computati

[GitHub] spark pull request: [WIP][MLLIB][SPARK-4675][SPARK-4823]RowSimilar...

2015-05-23 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/6213#issuecomment-104970859 Refactoring MatrixFactorizationModel.recommendForAll to a common place like Vectors/Matrices will help users who have dense data with modest columns (~1000-10K, most

[GitHub] spark pull request: [WIP][MLLIB][SPARK-4675][SPARK-4823]RowSimilar...

2015-05-23 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/6213#issuecomment-104968079 Runtime comparison are posted on SPARK-4823 on MovieLens1m dataset, 8 core, 4 GB executor memory from my laptop. Stage 24 - 35 is the row similarity flow

[GitHub] spark pull request: [WIP][MLLIB][SPARK-4675][SPARK-4823]RowSimilar...

2015-05-23 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/6213#issuecomment-104936678 Internally vector flow in IndexedRowMatrix has helped us to do additional optimization through user defined kernels and cut the computation which won't happen

[GitHub] spark pull request: [WIP][MLLIB][SPARK-4675][SPARK-4823]RowSimilar...

2015-05-23 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/6213#issuecomment-104934928 @mengxr I generalized MatrixFactorizationModel.recommendAll and use it for similarUsers and similarProducts and use dgemm...In IndexedRowMatrix I only exposed

[GitHub] spark pull request: [MLLIB][SPARK-4675, SPARK-4823] RowSimilarity

2015-05-20 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/6213#issuecomment-103925316 For gemv it is not clear how to re-use the scratch space for result vector...if we can't reuse the result vector over multiple calls to kernel.compute we won&

[GitHub] spark pull request: [MLLIB][SPARK-4675, SPARK-4823] RowSimilarity

2015-05-19 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/6213#issuecomment-103771669 Actually both for Euclidean and RBF it is possible as || x - y || can be decomposed as ||x||2 + ||y||2 - 2*dot(x,y) where dot(x,y) can be computed through dgemv

[GitHub] spark pull request: [MLLIB][SPARK-4675, SPARK-4823] RowSimilarity

2015-05-19 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/6213#issuecomment-103615439 I am thinking more. May be EuclideanKernel can be decomposed using Matrix x Vector as well --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: [MLLIB][SPARK-4675, SPARK-4823] RowSimilarity

2015-05-19 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/6213#issuecomment-103614290 SparseMatrix x SparseVector got merged to Master today https://github.com/apache/spark/pull/6209. I will update the PR and separate the code path for

[GitHub] spark pull request: [MLLIB][SPARK-4675, SPARK-4823] RowSimilarity

2015-05-17 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/6213#issuecomment-102843964 For CosineKernel and ProductKernel, we should be able to have a separate code path with BLAS-2 once SparseMatrix x SparseVector merges and BLAS-3 once SparseMatrix x

[GitHub] spark pull request: [MLLIB][SPARK-4675, SPARK-4823] RowSimilarity

2015-05-17 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/6213#issuecomment-102841783 @mengxr the failures are related to yarn suite which does not look related to my changes...tests I added ran fine... [info] *** 1 TEST FAILED *** [error

[GitHub] spark pull request: [SPARK-7681][MLlib] Add SparseVector support f...

2015-05-17 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/6209#issuecomment-102840355 Are there runtime comparisons posted with vector*vector operations for these changes BLAS-1 vs BLAS-2 ? SparseMatrix * SparseVector compared to Array[SparseVector] x

[GitHub] spark pull request: [MLLIB][SPARK-4675, SPARK-4823] RowSimilarity

2015-05-16 Thread debasish83
GitHub user debasish83 opened a pull request: https://github.com/apache/spark/pull/6213 [MLLIB][SPARK-4675, SPARK-4823] RowSimilarity @mengxr @srowen For RowMatrix with 100K columns, colSimilarity with bruteforce/dimsum sampling is used. This PR adds rowSimilarity to

[GitHub] spark pull request: [MLLIB][SPARK-4675] Find similar products and ...

2015-05-05 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3536#issuecomment-99098372 @MLnick yes that's what I did...I have to convince users why use factor vectors :-) For user->item recommendation, convincing is easy by showing the

[GitHub] spark pull request: [SPARK-4231][MLLIB][Examples] MAP calculation ...

2015-05-04 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/5869#issuecomment-98827606 @mengxr if you could please point to the ML pipeline module where I should add it, I can do the change... --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: [SPARK-4231][MLLIB][Examples] MAP calculation ...

2015-05-04 Thread debasish83
GitHub user debasish83 reopened a pull request: https://github.com/apache/spark/pull/5869 [SPARK-4231][MLLIB][Examples] MAP calculation added to examples.MovieLensALS MAP calculation driver to MovieLensALS was not part of SPARK-3066 merge. Added the driver in this PR

[GitHub] spark pull request: [SPARK-4231][MLLIB][Examples] MAP calculation ...

2015-05-04 Thread debasish83
Github user debasish83 closed the pull request at: https://github.com/apache/spark/pull/5869 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-4231][MLLIB][Examples] MAP calculation ...

2015-05-03 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/5869#issuecomment-98504679 RMSE is similar in my old runs..so the ALS core did not change...the MAP driver code is also same since I just migrated it from my PR. TUSCA09LMLVT00C:spark

[GitHub] spark pull request: [SPARK-4231][MLLIB][Examples] MAP calculation ...

2015-05-03 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/5869#issuecomment-98504419 Implicit lambda should not affect the explicit resultsI will take a closer look into the recommendForAll and compare with my old version.. --- If your project is

[GitHub] spark pull request: [SPARK-4231][MLLIB][Examples] MAP calculation ...

2015-05-03 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/5869#issuecomment-98502996 Stats from my old run: ./bin/spark-submit --master spark://TUSCA09LMLVT00C.local:7077 --class org.apache.spark.examples.mllib.MovieLensALS --jars ~/.m2

[GitHub] spark pull request: [SPARK-4231][MLLIB][Examples] MAP calculation ...

2015-05-03 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/5869#issuecomment-98491289 @srowen ideally we should move both the utilities to compute rmse and MAP on a MatrixFactorizationModel to a common place from examples since they are the APIs that

[GitHub] spark pull request: [MLLIB][SPARK-4675] Find similar products and ...

2015-05-02 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3536#issuecomment-98425139 @MLnick @srowen I did an experiment where I computed brute force topK similar items using cosine distance and compared the intersection with item factor based brute

[GitHub] spark pull request: [SPARK-4231][MLLIB][Examples] MAP calculation ...

2015-05-02 Thread debasish83
GitHub user debasish83 opened a pull request: https://github.com/apache/spark/pull/5869 [SPARK-4231][MLLIB][Examples] MAP calculation added to examples.MovieLensALS MAP calculation driver to MovieLensALS was not part of SPARK-3066 merge. Added the driver in this PR

[GitHub] spark pull request: [MLLib]SPARK-5027:add SVMWithLBFGS interface i...

2015-05-01 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3890#issuecomment-98190811 I mean for svm the formulation is over all rows right...the smooth max will be done on every row and label...max(0, 1 - y_i a_i*x)...so only change will be a diff

[GitHub] spark pull request: [MLLib]SPARK-5027:add SVMWithLBFGS interface i...

2015-05-01 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3890#issuecomment-98189044 nope...logistic is feature space...svm is data space...the gradient calculation / BFGS CostFun will change --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [MLLib]SPARK-5027:add SVMWithLBFGS interface i...

2015-05-01 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3890#issuecomment-98188550 @dlwh we should simply use your smooth max and make max(0, 1 - ya'x) differentiable for the first version...that needs no change to breeze...and then if needed w

[GitHub] spark pull request: [SPARK-3066][MLLIB] Support recommendAll in ma...

2015-05-01 Thread debasish83
Github user debasish83 commented on a diff in the pull request: https://github.com/apache/spark/pull/5829#discussion_r29494261 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -137,20 +141,113 @@ class

[GitHub] spark pull request: [SPARK-3066][MLLIB] Support recommendAll in ma...

2015-05-01 Thread debasish83
Github user debasish83 commented on a diff in the pull request: https://github.com/apache/spark/pull/5829#discussion_r29493623 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -137,20 +141,113 @@ class

[GitHub] spark pull request: [MLLIB] SPARK-4231: Add RankingMetrics to exam...

2015-05-01 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3098#issuecomment-98073840 Changed the title to add driver for recommendAll API once SPARK-3066 merges to master... --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [MLLib]SPARK-5027:add SVMWithLBFGS interface i...

2015-05-01 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3890#issuecomment-98073720 this is linear svm strictly in primal form...there are ways to fix it through going to dual space but that needs a linear / nonlinear kernel generation which might be

[GitHub] spark pull request: [MLLib]SPARK-5027:add SVMWithLBFGS interface i...

2015-05-01 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3890#issuecomment-98073658 @loachli hinge loss in linear svm is max(0, 1 - y*a'x) right ? Just replace max with a smooth max and you should be able to smooth hinge gradient and then it c

[GitHub] spark pull request: [SPARK-3066][MLLIB] Support recommendAll in ma...

2015-05-01 Thread debasish83
Github user debasish83 commented on a diff in the pull request: https://github.com/apache/spark/pull/5829#discussion_r29492840 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/MLPairRDDFunctions.scala --- @@ -39,7 +39,7 @@ class MLPairRDDFunctions[K: ClassTag, V: ClassTag

[GitHub] spark pull request: [SPARK-3066][MLLIB] Support recommendAll in ma...

2015-05-01 Thread debasish83
Github user debasish83 commented on a diff in the pull request: https://github.com/apache/spark/pull/5829#discussion_r29492705 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/rdd/MLPairRDDFunctions.scala --- @@ -39,7 +39,7 @@ class MLPairRDDFunctions[K: ClassTag, V: ClassTag

[GitHub] spark pull request: [SPARK-3066][MLLIB] Support recommendAll in ma...

2015-04-30 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/5829#issuecomment-98058780 @mengxr looks good to me...I will fix SPARK-4321 based on this merge...I need blockify for rowSimilarities (tall skinny sparse matrices for row similarities)...should

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-04-30 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3098#issuecomment-98021307 @mengxr please go ahead... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-04-26 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3098#issuecomment-96403986 was very last few weeks...update it in next few days... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [ML][MLLIB] SPARK-2426: Integrate Breeze Quadr...

2015-04-11 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3221#issuecomment-91869124 ohh sorry I don't know about requester pays...let me look into it --- If your project is set up for it, you can reply to this email and have your reply appe

[GitHub] spark pull request: [ML][MLLIB] SPARK-2426: Integrate Breeze Quadr...

2015-04-10 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3221#issuecomment-91710827 @jkbradley let me know if you need vzcloud access and I can create few nodes for you...ec2 might be easier for other's to access it as well... --- If your proje

[GitHub] spark pull request: [ML][MLLIB] SPARK-2426: Integrate Breeze Quadr...

2015-04-10 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3221#issuecomment-91710700 @jkbradley we still could not access the wikipedia dataset on ec2...will it be possible for you to upload the 1 Billion token dataset on EC2 ? I wanted to do a sparse

[GitHub] spark pull request: [ML] SPARK-2426: Integrate Breeze NNLS with ML...

2015-04-08 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/5005#issuecomment-90950074 if you look into breeze.optimize.proximal.Proximal, I added a library of projection/proximal operators...in my experiments looks like projection based algorithms (SPG

[GitHub] spark pull request: [ML] SPARK-2426: Integrate Breeze NNLS with ML...

2015-04-08 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/5005#issuecomment-90950364 Application is topic modeling using Sparsity constraints like L1 and probability simplex and supporting bounds in ALS --- If your project is set up for it, you can

[GitHub] spark pull request: [ML] SPARK-2426: Integrate Breeze NNLS with ML...

2015-04-08 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/5005#issuecomment-90942562 @tmyklebu do you have the original NNLS paper in english ? Breeze also has a linear CG...I am thinking if it is possible to merge simple projections like positivity

[GitHub] spark pull request: [ML] SPARK-2426: Integrate Breeze NNLS with ML...

2015-04-07 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/5005#issuecomment-90753271 Sure...Let me do that and point you to the repo...most likely it will be a breeze based branch and I will copy the mllib implementation over thr... --- If your

[GitHub] spark pull request: [ML][MLLIB] SPARK-2426: Integrate Breeze Quadr...

2015-04-07 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3221#issuecomment-90753041 @mengxr @josephk In my internal testing, I am finding the sparse formulations useful for extracting genre/topic information out of netflix/movielens dataset...The

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-04-05 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3098#issuecomment-89729777 agreed with the implicit MAP calculationFor netflix dataset, I got 0.014...May be I need to use a better regularization...was that 0.05-0.1 number from using

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-04-04 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3098#issuecomment-89729377 I meant MAP...what's the MAP on netflix dataset you have seen before and with what lambda ? I am running MAP experiments with various factorization formula

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-04-04 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3098#issuecomment-89706247 @coderxiang @mengxr If I have a dataset with implicit (click or 0) then MAP is not that well defined right since in label set everything is 1.0 and so there is no

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-04-04 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3098#issuecomment-89697236 @srowen For netflix dataset what's the MAP you have seen before...I started experiments on Netflix dataset...lambda is 0.065 for netflix as well right ? For Movi

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-04-04 Thread debasish83
Github user debasish83 commented on a diff in the pull request: https://github.com/apache/spark/pull/3098#discussion_r27769592 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/MovieLensALS.scala --- @@ -167,23 +169,66 @@ object MovieLensALS

[GitHub] spark pull request: [ML] SPARK-2426: Integrate Breeze NNLS with ML...

2015-04-04 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/5005#issuecomment-89594722 @mengxr any insight on it ? the runtime issue is only in first iteration and I think you can point out if there is any obvious issue in the way I call the solver

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-04-02 Thread debasish83
Github user debasish83 commented on a diff in the pull request: https://github.com/apache/spark/pull/3098#discussion_r27712646 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -138,14 +141,122 @@ class

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-03-31 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3098#issuecomment-88347022 @mengxr could you please do another passI might have missed the JavaRDD compatibility issue but fixed rest of your comments... --- If your project is set up for

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-03-31 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3098#issuecomment-88346990 I reran the map computation on MovieLens with varying ranks: Example run: ./bin/spark-submit --master spark://TUSCA09LMLVT00C.local:7077 --class

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-03-31 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3098#issuecomment-88292172 If we move computeRankingMetrics and computeRMSE to a better place, I can guard it through tests... --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-03-31 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3098#issuecomment-88291470 @mengxr I also added 2 test-cases for batch predict APIs. These features are useful if users are interested in computing MAP measures...Let me know if I move the

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-03-31 Thread debasish83
Github user debasish83 commented on a diff in the pull request: https://github.com/apache/spark/pull/3098#discussion_r27535273 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -103,13 +109,106 @@ class

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-03-31 Thread debasish83
Github user debasish83 commented on a diff in the pull request: https://github.com/apache/spark/pull/3098#discussion_r27533769 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -103,13 +109,106 @@ class

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-03-31 Thread debasish83
Github user debasish83 commented on a diff in the pull request: https://github.com/apache/spark/pull/3098#discussion_r27529681 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -103,13 +109,106 @@ class

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-03-31 Thread debasish83
Github user debasish83 commented on a diff in the pull request: https://github.com/apache/spark/pull/3098#discussion_r27529308 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -35,33 +41,33 @@ import

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-03-31 Thread debasish83
Github user debasish83 commented on a diff in the pull request: https://github.com/apache/spark/pull/3098#discussion_r27529347 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -35,33 +41,33 @@ import

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-03-31 Thread debasish83
Github user debasish83 commented on a diff in the pull request: https://github.com/apache/spark/pull/3098#discussion_r27529231 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -17,14 +17,20 @@ package

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-03-31 Thread debasish83
Github user debasish83 commented on a diff in the pull request: https://github.com/apache/spark/pull/3098#discussion_r27529218 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -17,14 +17,20 @@ package

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-03-31 Thread debasish83
Github user debasish83 commented on a diff in the pull request: https://github.com/apache/spark/pull/3098#discussion_r27528991 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/MovieLensALS.scala --- @@ -171,18 +175,62 @@ object MovieLensALS

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-03-31 Thread debasish83
Github user debasish83 commented on a diff in the pull request: https://github.com/apache/spark/pull/3098#discussion_r27528959 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/MovieLensALS.scala --- @@ -171,18 +175,62 @@ object MovieLensALS

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-03-31 Thread debasish83
Github user debasish83 commented on a diff in the pull request: https://github.com/apache/spark/pull/3098#discussion_r27528238 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/MovieLensALS.scala --- @@ -171,18 +175,62 @@ object MovieLensALS

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-03-31 Thread debasish83
Github user debasish83 commented on a diff in the pull request: https://github.com/apache/spark/pull/3098#discussion_r27528198 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/MovieLensALS.scala --- @@ -171,18 +175,62 @@ object MovieLensALS

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-03-31 Thread debasish83
Github user debasish83 commented on a diff in the pull request: https://github.com/apache/spark/pull/3098#discussion_r27528120 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/MovieLensALS.scala --- @@ -171,18 +175,62 @@ object MovieLensALS

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-03-31 Thread debasish83
Github user debasish83 commented on a diff in the pull request: https://github.com/apache/spark/pull/3098#discussion_r27528071 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/MovieLensALS.scala --- @@ -171,18 +175,62 @@ object MovieLensALS

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-03-31 Thread debasish83
Github user debasish83 commented on a diff in the pull request: https://github.com/apache/spark/pull/3098#discussion_r27525568 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/MovieLensALS.scala --- @@ -171,18 +175,62 @@ object MovieLensALS

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-03-31 Thread debasish83
Github user debasish83 commented on a diff in the pull request: https://github.com/apache/spark/pull/3098#discussion_r27525485 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/MovieLensALS.scala --- @@ -74,6 +75,9 @@ object MovieLensALS { opt[Unit

[GitHub] spark pull request: [ML][MLLIB] SPARK-2426: Integrate Breeze Quadr...

2015-03-28 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3221#issuecomment-87342283 What are MiMa tests ? I am bit confused on it... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [ML] SPARK-2426: Integrate Breeze NNLS with ML...

2015-03-28 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/5005#issuecomment-87276063 Updated the PR with breeze 0.11.2...Except first iteration, rest of them are at par: Breeze NNLS: TUSCA09LMLVT00C:spark-brznnls v606014$ grep

[GitHub] spark pull request: [ML][MLLIB] SPARK-2426: Integrate Breeze Quadr...

2015-03-27 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3221#issuecomment-87165211 I integrated with Breeze 0.11.2. Only visible difference is first iteration Breeze QuadraticMinimizer: TUSCA09LMLVT00C:spark-qp-als v606014$ grep

[GitHub] spark pull request: [ML][MLLIB] SPARK-2426: Integrate Breeze Quadr...

2015-03-27 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3221#issuecomment-86950106 @mengxr any updates on it ? breeze 0.11.2 is now integrated with Spark...I can clean up the PR for reviews --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [ML] SPARK-2426: Integrate Breeze NNLS with ML...

2015-03-27 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/5005#issuecomment-86949884 @mengxr any updates on it ? breeze 0.11.2 is now integrated with Spark --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [ML][MLLIB] SPARK-2426: Integrate Breeze Quadr...

2015-03-24 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3221#issuecomment-85814758 @mengxr I discussed with David and the only reason I can think of is that inside the solvers I am using DenseMatrix and DenseVector in-place of primitive arrays for

[GitHub] spark pull request: [ML][MLLIB] SPARK-2426: Integrate Breeze Quadr...

2015-03-23 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3221#issuecomment-85351062 All the runtime enhancements are being added to Breeze in this PR: https://github.com/scalanlp/breeze/pull/386 Please let me know if there are additional feedbacks

[GitHub] spark pull request: [ML] SPARK-2426: Integrate Breeze NNLS with ML...

2015-03-23 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/5005#issuecomment-85348266 All the runtime enhancements are being added to Breeze in this PR: https://github.com/scalanlp/breeze/pull/386 Please let me know if there are additional feedbacks

[GitHub] spark pull request: [ML][MLLIB] SPARK-2426: Integrate Breeze Quadr...

2015-03-23 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3221#issuecomment-85161041 @mengxr I added the optimization for lower triangular matrix and now they are very close...Let me know what do you think and if there are any other tricks you would

[GitHub] spark pull request: [ML][MLLIB] SPARK-2426: Integrate Breeze Quadr...

2015-03-22 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3221#issuecomment-84827225 I looked more into it and I will open up an API in Breeze QuadraticMinimizer where in-place of DenseMatrix gram, upper triangular gram can be sent but the inner

[GitHub] spark pull request: [ML][MLLIB] SPARK-2426: Integrate Breeze Quadr...

2015-03-22 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3221#issuecomment-84766375 Also for the ml.QuadraticSolver vs ml.CholeskySolver first iteration runtime difference I am considering opening up an API in Breeze QuadraticMinimizer which only

[GitHub] spark pull request: [ML][MLLIB] SPARK-2426: Integrate Breeze Quadr...

2015-03-22 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3221#issuecomment-84765491 I added all the testcases for ml.QuadraticSolver...the driver is through MovieLensALS right now where --userConstraint and --productConstraint can be specified...For

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an Arti...

2015-03-22 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/1290#issuecomment-84708094 @witgo there are lot of useful building blocks in your RBM PR...are you planning to consolidate them in this PR ? --- If your project is set up for it, you can reply

  1   2   >