[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2015-01-26 Thread avulanov
Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-71531266 @dbtsai I did batching for artificial neural networks and the performance improved ~5x https://github.com/apache/spark/pull/1290#issuecomment-70313952 --- If your proje

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2015-01-07 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-69125554 @avulanov I've thought about that. However, @mengxr told me that they have a intern trying to do this type of experiment last year, and they don't see significant perfor

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2015-01-07 Thread avulanov
Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-69123928 @dbtsai BTW., have you thought about batch processing of input vectors, i.e. stack N vectors into matrix and perform optimization with this matrix instead of vector? Wit

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2015-01-05 Thread avulanov
Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-68741897 @dbtsai Just back from vacation too:) I used my old implementation of the matrix form of back propagation and made sure that it properly uses stride of mat

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-23 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-68029618 @avulanov It's very encouraging benchmark result you saw in real world cluster setup. Since I'm on vacation recently, I don't actually deploy the new code and benchmark in

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-23 Thread avulanov
Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-68002991 New results of experiments with optimized ANN and MLOR are below. I used the same cluster of 6 machines with 12 workers total, mnist8m dataset as train and the standard

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-22 Thread avulanov
Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-67872100 @dbtsai I did local experiment on mnist and your new implementation seems to be more than 2x faster than the previous one! I am going to perform bigger experiments. In t

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-19 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-67720689 @avulanov The new branch is not finished yet. You need to rebase https://github.com/dbtsai/spark/tree/dbtsai-mlor to master, and just replace the gradient function. ---

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-19 Thread avulanov
Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-67719973 @dbtsai `GeneralizedLinearAlgorithm` throws exception `org.apache.spark.SparkException: Input validation failed.`. Moreover, there is no regression with LBFGS. Probably

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-19 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-67718128 Yes, `foreachActive` is the new API in Spark 1.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your p

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-19 Thread avulanov
Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-67718021 @dbtsai Thank you! Should I use the latest Spark with this Gradient? --- If your project is set up for it, you can reply to this email and have your reply appear on GitH

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-19 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-67716565 @avulanov PS, you can just replace the gradient function without doing any change. Let me know how much performance gain you see, and I'm very interested in this. Thanks.

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-19 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-67694284 @avulanov I don't check your implementation yet, but I'm ready to have the optimized MLOR for you to test. Can you try the `LogisticGradient` in https://github.com/AlpineN

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-16 Thread avulanov
Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-67257821 @dbtsai Hi! Did you have a chance to check our implementation and send me the optimized one? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-10 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-66513731 @avulanov I remembered CJ Lin said he posted the 600GB dataset on his website. --- If your project is set up for it, you can reply to this email and have your reply appe

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-10 Thread avulanov
Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-66490270 @dbtsai Thank you, I look forward for your code to perform benchmarks. Thanks again for the video! I've enjoy ed it, especially Q&A after the talk. At 51:23 Prof CJ Lin

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-09 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-66336110 @avulanov 1. I did the same optimization for MLlib in [my recently PRs](https://github.com/apache/spark/commits/master?author=dbtsai). * Accessing the va

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-09 Thread avulanov
Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-66320878 @jkbradley Thank you! They took some time. - I totally agree with you, I need to perform tests on the original test set. It contains less attributes, i.e. 778 vs

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-09 Thread avulanov
Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-66318526 @dbtsai 1) Could you elaborate on what kind of optimizations did you do? Probably, they could be applied to the broader MLlib, which is beneficial. 2) Do you know the re

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-08 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-66208868 @avulanov Nice tests! A few comments: * Computing accuracy: It would be good to test on the original MNIST test set, rather than a subset of the training set. The

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-08 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-66192930 @avulanov I did couple performance turning in the MLOR gradient calculation in my company's proprietary implementation which results 4x faster than the open source one in

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-05 Thread avulanov
Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-65879536 @dbtsai Here are the results of my tests: - Settings: - Spark: latest Spark merged with https://github.com/dbtsai/spark/tree/dbtsai-mlor (manual merge) and

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-02 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-65340600 @avulanov Sure, it's interesting to see the comparison. Let me know the result once you have it. I'm going to make it merge in 1.3, so will be easier to use it in the futu

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-02 Thread avulanov
Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-65339343 @dbtsai I've tried your implementation with `LBFGS` optimizer and it seems to have similar performance in terms of running time and accuracy to SGD that you have right n

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-11-20 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-63906768 no, in the algorithm, I already model the problem http://www.slideshare.net/dbtsai/2014-0620-mlor-36132297/24 , so there will always be only (num_features + 1)(num_classes

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-11-20 Thread avulanov
Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-63906173 @dbtsai Thanks for explanation! Do I understand correct, that if I want to get (num_features+1)*(num_classes) parameters from your model, I need to concatenate a vector

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-11-20 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-63904113 @avulanov I will merge this on Spark 1.3, and sorry for delay since I was very busy recently. Yes, the branch you found should work, but it can not be cleanly merged in up

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-11-19 Thread avulanov
Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-63748972 Apparently, I've found this implementation https://github.com/dbtsai/spark/tree/dbtsai-mlor. It did work on my examples producing reasonable results. Could you comment o

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-11-18 Thread avulanov
Github user avulanov commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-63577201 @dbtsai Hi! What is the current state of PR? I would like to download and test. Could you suggest where are the sources? --- If your project is set up for it, you can r

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-10-28 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-60813678 @BigCrunsh I'm working on this. Let's see if we can merge in Spark 1.2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitH

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-10-28 Thread BigCrunsh
Github user BigCrunsh commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-60792386 What is the current state of the PR? Can't see any changes in the code... --- If your project is set up for it, you can reply to this email and have your reply appear o

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-08-02 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-50983421 @pwendell I didn't see `Closes #1379` in the merged commit. Is something wrong with asfgit? --- If your project is set up for it, you can reply to this email and have you

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-08-02 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-50983381 ... I have no idea. Let me check. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-08-02 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-50982699 @mengxr Is there any problem with asfgit? This is not finished yet, why asfgit said it's merged into apache:master. --- If your project is set up for it, you can reply t

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-08-02 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1379 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-07-21 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-49682997 It is easier to review if it passes the tests. @SparkQA shows new public classes and interface changes. Could you remove the data file and generate some synthetic data for

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-49682447 QA tests have started for PR 1379. This patch DID NOT merge cleanly! View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16937/consoleFull

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-49682455 QA results for PR 1379:- This patch FAILED unit tests.For more information see test ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16937/consol

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-07-21 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-49682150 I think it fails due to the apache license is not in the test file. As you suggest, I'll move it to be generated in the runtime. Would like to know the general feedback. I

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-07-21 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-49681981 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-07-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-48796052 QA tests have started for PR 1379. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16579/consoleFull --- If

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-07-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-48796056 QA results for PR 1379:- This patch FAILED unit tests.- This patch merges cleanly- This patch adds the following public classes (experimental):* as used in multi-class cl

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-07-11 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/1379 [SPARK-2309][MLlib] Generalize the binary logistic regression into multinomial logistic regression Currently, there is no multi-class classifier in mllib. Logistic regression can be extended to mult