Github user avulanov commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-71531266
@dbtsai I did batching for artificial neural networks and the performance
improved ~5x https://github.com/apache/spark/pull/1290#issuecomment-70313952
---
If your proje
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-69125554
@avulanov I've thought about that. However, @mengxr told me that they
have a intern trying to do this type of experiment last year, and they don't
see significant perfor
Github user avulanov commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-69123928
@dbtsai BTW., have you thought about batch processing of input vectors,
i.e. stack N vectors into matrix and perform optimization with this matrix
instead of vector? Wit
Github user avulanov commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-68741897
@dbtsai
Just back from vacation too:)
I used my old implementation of the matrix form of back propagation and
made sure that it properly uses stride of mat
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-68029618
@avulanov It's very encouraging benchmark result you saw in real world
cluster setup. Since I'm on vacation recently, I don't actually deploy the new
code and benchmark in
Github user avulanov commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-68002991
New results of experiments with optimized ANN and MLOR are below. I used
the same cluster of 6 machines with 12 workers total, mnist8m dataset as train
and the standard
Github user avulanov commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-67872100
@dbtsai I did local experiment on mnist and your new implementation seems
to be more than 2x faster than the previous one! I am going to perform bigger
experiments. In t
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-67720689
@avulanov The new branch is not finished yet. You need to rebase
https://github.com/dbtsai/spark/tree/dbtsai-mlor to master, and just replace
the gradient function.
---
Github user avulanov commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-67719973
@dbtsai `GeneralizedLinearAlgorithm` throws exception
`org.apache.spark.SparkException: Input validation failed.`. Moreover, there is
no regression with LBFGS. Probably
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-67718128
Yes, `foreachActive` is the new API in Spark 1.2.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your p
Github user avulanov commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-67718021
@dbtsai Thank you! Should I use the latest Spark with this Gradient?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitH
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-67716565
@avulanov PS, you can just replace the gradient function without doing any
change. Let me know how much performance gain you see, and I'm very interested
in this. Thanks.
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-67694284
@avulanov I don't check your implementation yet, but I'm ready to have the
optimized MLOR for you to test. Can you try the `LogisticGradient` in
https://github.com/AlpineN
Github user avulanov commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-67257821
@dbtsai Hi! Did you have a chance to check our implementation and send me
the optimized one?
---
If your project is set up for it, you can reply to this email and have
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-66513731
@avulanov I remembered CJ Lin said he posted the 600GB dataset on his
website.
---
If your project is set up for it, you can reply to this email and have your
reply appe
Github user avulanov commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-66490270
@dbtsai Thank you, I look forward for your code to perform benchmarks.
Thanks again for the video! I've enjoy ed it, especially Q&A after the talk. At
51:23 Prof CJ Lin
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-66336110
@avulanov
1. I did the same optimization for MLlib in [my recently
PRs](https://github.com/apache/spark/commits/master?author=dbtsai).
* Accessing the va
Github user avulanov commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-66320878
@jkbradley Thank you! They took some time.
- I totally agree with you, I need to perform tests on the original test
set. It contains less attributes, i.e. 778 vs
Github user avulanov commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-66318526
@dbtsai 1) Could you elaborate on what kind of optimizations did you do?
Probably, they could be applied to the broader MLlib, which is beneficial. 2)
Do you know the re
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-66208868
@avulanov Nice tests! A few comments:
* Computing accuracy: It would be good to test on the original MNIST test
set, rather than a subset of the training set. The
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-66192930
@avulanov I did couple performance turning in the MLOR gradient calculation
in my company's proprietary implementation which results 4x faster than the
open source one in
Github user avulanov commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-65879536
@dbtsai Here are the results of my tests:
- Settings:
- Spark: latest Spark merged with
https://github.com/dbtsai/spark/tree/dbtsai-mlor (manual merge) and
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-65340600
@avulanov Sure, it's interesting to see the comparison. Let me know the
result once you have it. I'm going to make it merge in 1.3, so will be easier
to use it in the futu
Github user avulanov commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-65339343
@dbtsai I've tried your implementation with `LBFGS` optimizer and it seems
to have similar performance in terms of running time and accuracy to SGD that
you have right n
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-63906768
no, in the algorithm, I already model the problem
http://www.slideshare.net/dbtsai/2014-0620-mlor-36132297/24 , so there will
always be only (num_features + 1)(num_classes
Github user avulanov commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-63906173
@dbtsai Thanks for explanation! Do I understand correct, that if I want to
get (num_features+1)*(num_classes) parameters from your model, I need to
concatenate a vector
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-63904113
@avulanov I will merge this on Spark 1.3, and sorry for delay since I was
very busy recently. Yes, the branch you found should work, but it can not be
cleanly merged in up
Github user avulanov commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-63748972
Apparently, I've found this implementation
https://github.com/dbtsai/spark/tree/dbtsai-mlor. It did work on my examples
producing reasonable results. Could you comment o
Github user avulanov commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-63577201
@dbtsai Hi! What is the current state of PR? I would like to download and
test. Could you suggest where are the sources?
---
If your project is set up for it, you can r
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-60813678
@BigCrunsh I'm working on this. Let's see if we can merge in Spark 1.2
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitH
Github user BigCrunsh commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-60792386
What is the current state of the PR? Can't see any changes in the code...
---
If your project is set up for it, you can reply to this email and have your
reply appear o
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-50983421
@pwendell I didn't see `Closes #1379` in the merged commit. Is something
wrong with asfgit?
---
If your project is set up for it, you can reply to this email and have you
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-50983381
... I have no idea. Let me check.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-50982699
@mengxr Is there any problem with asfgit? This is not finished yet, why
asfgit said it's merged into apache:master.
---
If your project is set up for it, you can reply t
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/1379
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enab
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-49682997
It is easier to review if it passes the tests. @SparkQA shows new public
classes and interface changes. Could you remove the data file and generate some
synthetic data for
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-49682447
QA tests have started for PR 1379. This patch DID NOT merge cleanly!
View progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16937/consoleFull
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-49682455
QA results for PR 1379:- This patch FAILED unit tests.For more
information see test
ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16937/consol
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-49682150
I think it fails due to the apache license is not in the test file. As you
suggest, I'll move it to be generated in the runtime. Would like to know the
general feedback. I
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-49681981
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-48796052
QA tests have started for PR 1379. This patch merges cleanly. View
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16579/consoleFull
---
If
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-48796056
QA results for PR 1379:- This patch FAILED unit tests.- This patch
merges cleanly- This patch adds the following public classes
(experimental):* as used in multi-class cl
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/1379
[SPARK-2309][MLlib] Generalize the binary logistic regression into
multinomial logistic regression
Currently, there is no multi-class classifier in mllib. Logistic regression
can be extended to mult
43 matches
Mail list logo