Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/2068#issuecomment-53138329
@atalwalkar and @mengxr I just addressed the merge conflict. I think it's
ready to merge. Thanks.
---
If your project is set up for it, you can reply to this email
GitHub user dbtsai reopened a pull request:
https://github.com/apache/spark/pull/2207
[SPARK-3317][MLlib] The loss of regularization in Updater should use the
oldWeights
The current loss of the regularization is computed from the newWeights
which is not correct. The loss, R(w) = 1
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/2207#issuecomment-53933078
LBFGS needs correct loss to find next weights while SGD doesn't.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user dbtsai closed the pull request at:
https://github.com/apache/spark/pull/2207
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/2207#issuecomment-54002680
@srowen @mengxr
I was working on OWLQN for L1 in my company, and I didn't follow the LBFGS
code so I was confused. The current code in MLlib actually gives
Github user dbtsai closed the pull request at:
https://github.com/apache/spark/pull/2207
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/2207#issuecomment-54002773
PS, it seems that I can not close
https://issues.apache.org/jira/browse/SPARK-3317 myself. Can any of you close
for me? Thanks.
---
If your project is set up
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/2207#issuecomment-54002970
You are right. Using my desktop without login session. Thanks
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3220#discussion_r20206271
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/MultivariateOnlineSummarizer.scala
---
@@ -50,6 +50,29 @@ class MultivariateOnlineSummarizer
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3220#discussion_r20207949
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/MultivariateOnlineSummarizer.scala
---
@@ -124,37 +128,28 @@ class MultivariateOnlineSummarizer
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3220#discussion_r20208266
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/MultivariateOnlineSummarizer.scala
---
@@ -50,6 +50,29 @@ class MultivariateOnlineSummarizer
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/3220#issuecomment-62689770
LGTM. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/3220#issuecomment-62694226
Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/3216#issuecomment-62856261
It works for me as well.
á |activeIterator *|$ ./bin/pyspark
Python 2.7.6 (default, Sep 9 2014, 15:04:36)
[GCC 4.2.1 Compatible Apple LLVM
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/3288
[SPARK-4431][MLlib] Implement efficient activeIterator for dense and sparse
vector
Previously, we were using Breeze's activeIterator to access the non-zero
elements
in sparse vector
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3288#discussion_r20532934
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
---
@@ -76,6 +76,22 @@ sealed trait Vector extends Serializable {
def copy
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3288#discussion_r20533260
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
---
@@ -273,6 +289,47 @@ class DenseVector(val values: Array[Double]) extends
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3288#discussion_r20544650
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
---
@@ -273,6 +289,47 @@ class DenseVector(val values: Array[Double]) extends
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/3288#issuecomment-63566328
(PS, when I did the bytecode analysis, I found that accessing the
member variables of values and values.size require two operation.
By having a local copy
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3288#discussion_r20553260
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
---
@@ -76,6 +76,22 @@ sealed trait Vector extends Serializable {
def copy
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3288#discussion_r20554090
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
---
@@ -76,6 +76,22 @@ sealed trait Vector extends Serializable {
def copy
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3288#discussion_r20615000
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
---
@@ -76,6 +76,22 @@ sealed trait Vector extends Serializable {
def copy
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3288#discussion_r20687461
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/MultivariateOnlineSummarizer.scala
---
@@ -95,22 +93,7 @@ class MultivariateOnlineSummarizer
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3288#discussion_r20688070
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala ---
@@ -173,4 +173,63 @@ class VectorsSuite extends FunSuite {
val v
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-63904113
@avulanov I will merge this on Spark 1.3, and sorry for delay since I was
very busy recently. Yes, the branch you found should work, but it can not be
cleanly merged
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-63906768
no, in the algorithm, I already model the problem
http://www.slideshare.net/dbtsai/2014-0620-mlor-36132297/24 , so there will
always be only (num_features + 1
Github user dbtsai closed the pull request at:
https://github.com/apache/spark/pull/3288
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user dbtsai closed the pull request at:
https://github.com/apache/spark/pull/2709
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/3435
[SPARK-4581][MLlib] Refactorize StandardScaler to improve the
transformation performance
The following optimizations are done to improve the StandardScaler model
transformation performance
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-64304769
@mengxr
Without the local reference copy of `factor` and `shift` arrays, the
runtime is almost three time slower.
DenseVector withMean and withStd
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-64304881
PS, we may want to go though the mllib codebase, and find things like this.
This issue impacts the performance quite a lot.
---
If your project is set up for it, you can
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/3435#issuecomment-64308394
Wow, with
```scala
private[this] val factor: Array[Double] = {
val f = Array.ofDim[Double](variance.size)
var i = 0
while (i f.size
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/3446
[SPARK-4596][MLLib] Refactorize Normalizer to make code cleaner
In this refactoring, the performance will be slightly increased due to
removing
the overhead from breeze vector. The bottleneck
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3435#discussion_r20847415
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -97,30 +97,57 @@ class StandardScalerModel private[mllib
Github user dbtsai closed the pull request at:
https://github.com/apache/spark/pull/3446
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3435#discussion_r20885451
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -97,30 +97,57 @@ class StandardScalerModel private[mllib
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/3462
Implement the efficient vector norm
The vector norm in breeze is implemented by `activeIterator` which is known
to be very slow.
In this PR, an efficient vector norm is implemented
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/3462#issuecomment-64505454
Using `foreachActive` instead of `while loop`
DenseVector: 12.95secs
SparseVector: 2.89secs
```scala
private[spark] def norm(p: Double): Double
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3462#discussion_r20919934
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
---
@@ -85,6 +85,52 @@ sealed trait Vector extends Serializable
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3462#discussion_r20921838
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
---
@@ -261,6 +261,57 @@ object Vectors {
sys.error(Unsupported Breeze
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3462#discussion_r20921892
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
---
@@ -261,6 +261,57 @@ object Vectors {
sys.error(Unsupported Breeze
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3462#discussion_r20921916
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
---
@@ -261,6 +261,57 @@ object Vectors {
sys.error(Unsupported Breeze
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3462#discussion_r20961188
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
---
@@ -261,6 +261,57 @@ object Vectors {
sys.error(Unsupported Breeze
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3462#discussion_r20967444
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
---
@@ -261,6 +261,57 @@ object Vectors {
sys.error(Unsupported Breeze
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3462#discussion_r20968353
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
---
@@ -261,6 +261,57 @@ object Vectors {
sys.error(Unsupported Breeze
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3462#discussion_r20970806
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
---
@@ -261,6 +261,57 @@ object Vectors {
sys.error(Unsupported Breeze
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/582
[SPARK-1157][MLlib] Bug fix: lossHistory should be monotonically decresing
Instead of recording the loss in the costFun for each time that optimizer
calls costFun, we get the loss from the api
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/582#issuecomment-41740842
@mengxr Just did some hack on trying to implement the right stochastic
L-BFGS, and it kind of works as long as we don't change the objective function
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/582#issuecomment-41751464
Make sense from the inverse of hessian point of view. Just remove it!
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/458#issuecomment-42160096
lbfgs is not good for L1 problem. I'm working on and preparing to do
benchmark with bfgs variant OWL-QN for L1 which is ideal to be compared with
ADMM.
---
If your
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/703
MLlib documentation fix
Fixed the documentation for that `loadLibSVMData` is changed to
`loadLibSVMFile`.
You can merge this pull request into a Git repository by running:
$ git pull https
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/702#discussion_r12502968
--- Diff: docs/mllib-optimization.md ---
@@ -163,3 +171,108 @@ each iteration, to compute the gradient direction.
Available algorithms for gradient descent
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/702#discussion_r12499609
--- Diff: docs/mllib-optimization.md ---
@@ -163,3 +177,100 @@ each iteration, to compute the gradient direction.
Available algorithms for gradient descent
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/702#discussion_r12499183
--- Diff: docs/mllib-optimization.md ---
@@ -128,10 +128,24 @@ is sampled, i.e. `$|S|=$ miniBatchFraction $\cdot n =
1$`, then the algorithm is
standard
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/702
L-BFGS Documentation
Documentation for L-BFGS, and an example of training binary L2 logistic
regression using L-BFGS.
You can merge this pull request into a Git repository by running:
$ git
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/702#discussion_r12499273
--- Diff: docs/mllib-optimization.md ---
@@ -128,10 +128,24 @@ is sampled, i.e. `$|S|=$ miniBatchFraction $\cdot n =
1$`, then the algorithm is
standard
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/834
[SPARK-1870][branch-0.9] Jars added by sc.addJar are not in the default
classLoader in executor for YARN
The summary is copied from Sandy's comment in the mailing list.
The relevant
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/848#discussion_r12921552
--- Diff:
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -479,37 +485,24 @@ object ClientBase
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/848#discussion_r12921709
--- Diff:
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -479,37 +485,24 @@ object ClientBase
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/848#issuecomment-43812877
Thanks. It looks great for me, and better than my patch.
cachedSecondaryJarLinks.foreach(addPwdClasspathEntry) is not needed since
we have
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/848#issuecomment-43814642
It works under driver before, so the major issue is those files are not in
executor's distributed cache. But I like the idea to add them explicitly so
we'll not miss
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/955
[SPARK-1969][MLlib] Public available online summarizer for mean, variance,
min, and max
It basically moved the private ColumnStatisticsAggregator class from
RowMatrix to public available
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/955#issuecomment-45023171
Since the Statistical in MultivariateStatisticalSummary is already in the
package name as stat, I think it worths to have a concise name. Also, most
people spell
Github user dbtsai closed the pull request at:
https://github.com/apache/spark/pull/834
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/955#issuecomment-45026777
Don't know why jenkins is not happy with removing private class
ColumnStatisticsAggregator(private val n: Int). After all, it's a private
class.
---
If your project
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/959
Fixed a typo
in RowMatrix.scala
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dbtsai/spark dbtsai-typo
Alternatively you can review and apply
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/955#issuecomment-45124672
@mengxr Get you. It's false-positive error. Do you have any comment or
feedback moving it out as public api? I'm building a feature scaling api in
MlUtils which depends
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/987
[SPARK-1177] Allow SPARK_JAR to be set programmatically in system properties
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dbtsai/spark
dbtsai
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/987#issuecomment-45286460
@chesterxgchen
#560 Agree, it's a more throughout way to handle this issue. In the code
you have, it seems that the spark jar setting is moved to conf: SparkConf
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/987#issuecomment-45292804
The app's code will only run in the application master in yarn-cluster
mode, how can yarn client know which jar will be submitted to distributed cache
if we set
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/987#issuecomment-45296471
We lunched Spark job inside our tomcat, and we directly use Client.scala
API. With my patch, I can setup the spark jar using System.setProperty() before
val
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/955#issuecomment-45297396
k... better to have Mima exclude the private class automatically, or we can
have annotation for the private class.
---
If your project is set up for it, you can reply
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/987#issuecomment-45363846
Got you. Looking forward to having your patch merged. Thanks.
Sent from my Google Nexus 5
On Jun 6, 2014 9:35 AM, Marcelo Vanzin notificati...@github.com wrote
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/1013
[SPARK-1870] Ported from 1.0 branch to 0.9 branch.
Made deployment with --jars work in yarn-standalone mode. Sent secondary
jars to distributed cache of all containers and add the cached jars
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1013#issuecomment-45451719
CC: @mengxr and @sryza
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1013#issuecomment-45459920
Work in my local VM. Should work in real yarn cluster. Will test it
tomorrow in the office.
---
If your project is set up for it, you can reply to this email and have
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3462#discussion_r21076434
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
---
@@ -261,6 +261,57 @@ object Vectors {
sys.error(Unsupported Breeze
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/3565
[SPARK-4708][MLLib] Make k-mean runs two/three times faster with
dense/sparse sample
Note that the usage of `breezeSquaredDistance` in
`org.apache.spark.mllib.util.MLUtils.fastSquaredDistance
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/3565#issuecomment-65340272
Calling BLAS will add very small extra overhead. The benchmark will now be
DenseVector: 33.19secs
SparseVector: 22.05secs
---
If your project is set up
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-65340600
@avulanov Sure, it's interesting to see the comparison. Let me know the
result once you have it. I'm going to make it merge in 1.3, so will be easier
to use
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/3577
[SPARK-4717][MLlib] Optimize BLAS library to avoid de-reference multiple
times in loop
Have a local reference to `values` and `indices` array in the `Vector`
object
so JVM can locate the value
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-66192930
@avulanov I did couple performance turning in the MLOR gradient calculation
in my company's proprietary implementation which results 4x faster than the
open source one
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-66336110
@avulanov
1. I did the same optimization for MLlib in [my recently
PRs](https://github.com/apache/spark/commits/master?author=dbtsai).
* Accessing
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-66513731
@avulanov I remembered CJ Lin said he posted the 600GB dataset on his
website.
---
If your project is set up for it, you can reply to this email and have your
reply
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/3735
[SPARK-4887][MLlib] Fix a bad unittest in LogisticRegressionSuite
The original test doesn't make sense since if you step in, the lossSum is
already NaN,
and the coefficients are diverging
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/3735#issuecomment-67562831
I agree. The test is not good. I'm thinking we probably can add couple well
known dataset like iris or prostate cancer dataset into the test resource, and
we can compare
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/3746
[SPARK-4907][MLlib] Inconsistent loss and gradient in LeastSquaresGradient
compared with R
In most of the academic paper and algorithm implementations,
people use L = 1/2n ||A weights-y||^2
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-67694284
@avulanov I don't check your implementation yet, but I'm ready to have the
optimized MLOR for you to test. Can you try the `LogisticGradient` in
https://github.com
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-67716565
@avulanov PS, you can just replace the gradient function without doing any
change. Let me know how much performance gain you see, and I'm very interested
in this. Thanks
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-67718128
Yes, `foreachActive` is the new API in Spark 1.2.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-67720689
@avulanov The new branch is not finished yet. You need to rebase
https://github.com/dbtsai/spark/tree/dbtsai-mlor to master, and just replace
the gradient function
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/3746#issuecomment-67842962
@bryanyang0528 The learning rate issue here is different story. With modern
optimization algorithms like LBFGS and OWLQN, the learning rate is not
required
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/1518#discussion_r22173571
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/optimization/Regularizer.scala ---
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-68029618
@avulanov It's very encouraging benchmark result you saw in real world
cluster setup. Since I'm on vacation recently, I don't actually deploy the new
code and benchmark
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/3808
[SPARK-4972][MLlib] Updated the scala doc for lasso and ridge regression
for the change of LeastSquaresGradient
In #SPARK-4907, we added factor of 2 into the LeastSquaresGradient. We
updated
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/3833
[SPARK-2309][MLlib] Multinomial Logistic Regression
#1379 is automatically closed by asfgit, and github can not reopen it once
it's closed, so this will be the new PR.
Binary Logistic
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/3846#issuecomment-68397022
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/4140#discussion_r23486231
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -61,20 +61,30 @@ class StandardScaler(withMean: Boolean, withStd
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/4140#issuecomment-71281849
For the unit-test part, is it possible not to change too much? Also, it
will be easier to debug if the assertion is in the test instead of abstract
out. For example
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/4140#discussion_r23485163
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -61,20 +61,30 @@ class StandardScaler(withMean: Boolean, withStd
101 - 200 of 1777 matches
Mail list logo