Github user dbtsai commented on the issue:
https://github.com/apache/spark/pull/21848
@kiszk `trait Stateful extends Nondeterministic`, and this rule will not be
invoked when an expression is nondeterministic
Github user dbtsai commented on the issue:
https://github.com/apache/spark/pull/21864
LGTM. Merged into master.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user dbtsai commented on the issue:
https://github.com/apache/spark/pull/21848
Here is a followup PR for making `AssertTrue` and `AssertNotNull`
`non-deterministic` https://issues.apache.org/jira/browse/SPARK-24913
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21850#discussion_r204953202
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
---
@@ -414,6 +414,16 @@ object SimplifyConditionals
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21850#discussion_r204953356
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
---
@@ -414,6 +414,16 @@ object SimplifyConditionals
Github user dbtsai commented on the issue:
https://github.com/apache/spark/pull/21850
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21850#discussion_r205187664
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
---
@@ -414,6 +414,16 @@ object SimplifyConditionals
Github user dbtsai commented on the issue:
https://github.com/apache/spark/pull/21850
@gatorsmile All the new rules added into `If` should always have `CaseWhen`
version.
But there will be time that we only add `If` version, or it only makes
sense to have `If` version
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21852#discussion_r205305691
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyConditionalSuite.scala
---
@@ -122,4 +126,25 @@ class
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21852#discussion_r205306098
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
---
@@ -416,6 +416,22 @@ object SimplifyConditionals
Github user dbtsai commented on the issue:
https://github.com/apache/spark/pull/21847
+cc @MaxGekk and @gengliangwang who worked on this part of codebase.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21850#discussion_r205556780
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
---
@@ -414,6 +414,9 @@ object SimplifyConditionals extends
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21852#discussion_r205599224
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
---
@@ -416,6 +416,29 @@ object SimplifyConditionals
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21847#discussion_r205648911
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -87,17 +88,33 @@ class AvroSerializer(rootCatalystType
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21847#discussion_r205685728
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -165,16 +183,112 @@ class AvroSerializer(rootCatalystType
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21847#discussion_r205683257
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -148,7 +165,8 @@ class AvroSerializer(rootCatalystType
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21847#discussion_r205684257
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -165,16 +183,112 @@ class AvroSerializer(rootCatalystType
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21847#discussion_r205692778
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -165,16 +183,112 @@ class AvroSerializer(rootCatalystType
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21847#discussion_r205692946
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -165,16 +183,112 @@ class AvroSerializer(rootCatalystType
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21850#discussion_r205830257
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
---
@@ -414,6 +414,9 @@ object SimplifyConditionals extends
Github user dbtsai commented on the issue:
https://github.com/apache/spark/pull/21852
+cc @cloud-fan and @gatorsmile
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/21904
[SPARK-24953] [SQL] Prune a branch in `CaseWhen` if previously seen
## What changes were proposed in this pull request?
If a condition in a branch is previously seen, this branch can be
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21852#discussion_r205946975
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
---
@@ -416,6 +416,23 @@ object SimplifyConditionals
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21904#discussion_r205963712
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
---
@@ -416,6 +416,29 @@ object SimplifyConditionals
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21852#discussion_r206266243
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
---
@@ -416,6 +416,23 @@ object SimplifyConditionals
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21852#discussion_r206271589
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
---
@@ -416,6 +416,23 @@ object SimplifyConditionals
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21904#discussion_r206333426
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
---
@@ -416,6 +450,12 @@ object SimplifyConditionals
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21847#discussion_r206350423
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -87,17 +87,30 @@ class AvroSerializer(rootCatalystType
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21847#discussion_r206353416
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -120,7 +133,7 @@ class AvroSerializer(rootCatalystType
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21847#discussion_r206356380
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -146,9 +159,13 @@ class AvroSerializer(rootCatalystType
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21847#discussion_r206356838
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -146,9 +159,13 @@ class AvroSerializer(rootCatalystType
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21847#discussion_r206358703
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -165,16 +182,118 @@ class AvroSerializer(rootCatalystType
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21847#discussion_r206359706
--- Diff:
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -165,16 +182,118 @@ class AvroSerializer(rootCatalystType
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/21852#discussion_r206695251
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
---
@@ -416,6 +416,21 @@ object SimplifyConditionals
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/21952
[SPARK-24993] [SQL] [WIP] Make Avro Fast Again
## What changes were proposed in this pull request?
When @lindblombr developed
[SPARK-24855](https://github.com/apache/spark/pull/21847) to
Github user dbtsai commented on the issue:
https://github.com/apache/spark/pull/21952
@viirya How did you run the benchmark? I tried again on my desktop, and
still got consistent regression. Thanks.
Spark 2.4
```
spark git:(master) ./build/mvn -DskipTests clean
Github user dbtsai commented on the issue:
https://github.com/apache/spark/pull/21952
@cloud-fan as you suggested, I benchmarked cache read performance, and the
performance is the same. This makes sense, since it's unlikely that cache read
performance is that bad so we can se
Github user dbtsai commented on the issue:
https://github.com/apache/spark/pull/21495
I was on a family leave for couple weeks. Thank you all for helping out and
merging it.
The only change with this PR is that the welcome message will be printed
first, and then the Spark
Github user dbtsai commented on the issue:
https://github.com/apache/spark/pull/21692
@viirya thanks for this PR. I thought SBT always uses pom for dependencies,
and I wonder why there is a discrepancy so we need to manually override it
Github user dbtsai commented on the issue:
https://github.com/apache/spark/pull/21459
There are three approvals from the committers, and the changes are pretty
trivial to revert if we see any performance regression which is unlikely. To
move thing forward, if there is no further
Github user dbtsai commented on the issue:
https://github.com/apache/spark/pull/21459
Thanks. Merged into master.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/4259#discussion_r29098013
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala ---
@@ -256,4 +256,38 @@ trait HasFitIntercept extends Params
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/4259#discussion_r29098031
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -42,34 +50,122 @@ private[regression] trait LinearRegressionParams
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/4259#discussion_r29098568
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -42,34 +50,122 @@ private[regression] trait LinearRegressionParams
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/3890#issuecomment-69624254
@loachli OWLQN doesn't automatically solve the issue of
non-differentiability. As a result, you have to remove the L1 term from
HingeGradient, and use the Breeze
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/3833#issuecomment-69889747
Jenkins, please re-test again.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3833#discussion_r22963566
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
---
@@ -18,30 +18,36 @@
package
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3833#discussion_r22963904
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
---
@@ -18,30 +18,36 @@
package
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3833#discussion_r22965406
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
---
@@ -61,20 +67,70 @@ class LogisticRegressionModel
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3833#discussion_r22967437
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
---
@@ -61,20 +67,70 @@ class LogisticRegressionModel
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/3833#issuecomment-70007063
Jenkins, please re-test again.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/3833#issuecomment-70936401
Ping @mengxr
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/4140#discussion_r23485163
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -61,20 +61,30 @@ class StandardScaler(withMean: Boolean, withStd
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/4140#discussion_r23486231
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -61,20 +61,30 @@ class StandardScaler(withMean: Boolean, withStd
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/4140#issuecomment-71281849
For the unit-test part, is it possible not to change too much? Also, it
will be easier to debug if the assertion is in the test instead of abstract
out. For example
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/4140#discussion_r23576821
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -61,20 +61,34 @@ class StandardScaler(withMean: Boolean, withStd
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/4140#discussion_r23576935
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -61,20 +61,34 @@ class StandardScaler(withMean: Boolean, withStd
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/4140#discussion_r23577023
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -61,20 +61,34 @@ class StandardScaler(withMean: Boolean, withStd
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/4140#issuecomment-71566737
LGTM except those two minor details. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/4140#discussion_r23580058
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -61,20 +61,34 @@ class StandardScaler(withMean: Boolean, withStd
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/4140#discussion_r23660759
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala ---
@@ -61,20 +61,34 @@ class StandardScaler(withMean: Boolean, withStd
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3833#discussion_r23743161
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala ---
@@ -55,24 +57,79 @@ abstract class Gradient extends Serializable
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/4259
[SPARK-5253] [ML] LinearRegression with L1/L2 (ElasticNet) using OWLQN
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/AlpineNow/spark lir
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3833#discussion_r23743197
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala ---
@@ -55,24 +57,79 @@ abstract class Gradient extends Serializable
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3833#discussion_r23823903
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala
---
@@ -55,6 +56,97 @@ object LogisticRegressionSuite
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/3833#discussion_r23823961
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala
---
@@ -285,6 +377,97 @@ class LogisticRegressionSuite
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/840#issuecomment-57439459
@debasish83 and @codedeft The weighted method for OWLQN in breeze is merged
https://github.com/scalanlp/breeze/commit/2570911026aa05aa1908ccf7370bc19cd8808a4c
I
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-58183559
We had a build against the spark master on Oct 2, and when ran our
application with data around 600GB, we got the following exception. Does this
PR fix this issue which
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/2693
[SPARK-3832][MLlib] Upgrade Breeze dependency to 0.10
In Breeze 0.10, the L1regParam can be configured through anonymous function
in OWLQN, and each component can be penalized differently. This is
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/2030#issuecomment-58214186
I thought it was a close issue, so I moved my comment to JIRA. I ran into
this issue in spark-shell not the standalone application, does SPARK-3762
apply in this
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/2693#issuecomment-58276308
@dlwh David, do you know if there is dependency change in breeze-0.10 and
is it compatible with both scala 2.10 and 2.11? Thanks.
---
If your project is set up for it
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/2709
Minor change in the comment of spark-defaults.conf.template
spark-defaults.conf is used in spark-shell as well, and this PR added this
into the comment.
You can merge this pull request into a Git
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/2712#issuecomment-58361701
Jenkins, please start the test.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/2718#issuecomment-58435304
LGTM Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/2712#issuecomment-58629065
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/2712#issuecomment-58732030
It's failing at FlumeStreamSuite.scala:109 which seems to be unrelated to
this patch.
---
If your project is set up for it, you can reply to this email and have
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/2709#issuecomment-59667207
@andrewor14 Sorry for late reply since I was on vacation in Europe last
week. I can continue work on this after I finish my talk in IOTA conf tomorrow.
---
If your
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/4593#issuecomment-74805610
Sorry for the late reply since I'm traveling recently. My concern is that
will this cause "caching twice" in the new ML api? For example, in
ml
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/4801
[SPARK-5537][MLib] Expand user guide for multinomial logistic regression
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/AlpineNow/spark mlor
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/4861#discussion_r25656411
--- Diff: docs/mllib-linear-methods.md ---
@@ -144,41 +152,7 @@ denoted by $\x$, the model makes predictions based on
the value of $\wv^T \x$.
By the
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/4866
[SPARK-5537] Add user guide for multinomial logistic regression
Adding more description on top of #4861.
You can merge this pull request into a Git repository by running:
$ git pull https
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/4259#issuecomment-76880508
@jkbradley I will rebase soon. @debasish83 I'll add MLOR with elastic-net
when we stabilize the new ML api. Doing this in old codebase will be huge
effort, and I
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/4879
[SPARK-6141][MLlib] Upgrade Breeze from 0.10 to 0.11 to fix convergence bug
LBFGS and OWLQN in Breeze 0.10 has convergence check bug.
This is fixed in 0.11, see the description in Breeze project
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/4879#issuecomment-77044998
@coderxiang Breeze seems to accidentally remove the public constructor of
CSCMatrix, and we have a PR to Breeze to address it. Let's see if we can make
it.
---
If
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/4879#issuecomment-77054801
This is the fix in breeze side for missing public constructor of CSCMatrix
https://github.com/scalanlp/breeze/pull/375
---
If your project is set up for it, you can
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/4259#issuecomment-85275887
@jkbradley and @mengxr I just rebased it. Will do couple optimizations to
avoid the scaling on the datasets which can be done in the optimization
instead. You guys can
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/4259#discussion_r27180837
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/sharedParams.scala
---
@@ -34,6 +34,43 @@ private[ml] trait HasRegParam extends Params {
def
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/4259#issuecomment-86731157
@jkbradley I think we should only support basic regularization in spark.ml
first which is what python scikit-learn does. If users have the need of
different type of
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/4259#discussion_r27260459
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/sharedParams.scala
---
@@ -34,6 +34,43 @@ private[ml] trait HasRegParam extends Params {
def
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/4259#discussion_r27333012
--- Diff: mllib/src/main/scala/org/apache/spark/ml/param/sharedParams.scala
---
@@ -34,6 +34,43 @@ private[ml] trait HasRegParam extends Params {
def
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-66192930
@avulanov I did couple performance turning in the MLOR gradient calculation
in my company's proprietary implementation which results 4x faster than the
open source o
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-66336110
@avulanov
1. I did the same optimization for MLlib in [my recently
PRs](https://github.com/apache/spark/commits/master?author=dbtsai).
* Accessing the
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-66513731
@avulanov I remembered CJ Lin said he posted the 600GB dataset on his
website.
---
If your project is set up for it, you can reply to this email and have your
reply
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/3735
[SPARK-4887][MLlib] Fix a bad unittest in LogisticRegressionSuite
The original test doesn't make sense since if you step in, the lossSum is
already NaN,
and the coefficients are dive
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/3735#issuecomment-67562831
I agree. The test is not good. I'm thinking we probably can add couple well
known dataset like iris or prostate cancer dataset into the test resource, and
we can co
GitHub user dbtsai opened a pull request:
https://github.com/apache/spark/pull/3746
[SPARK-4907][MLlib] Inconsistent loss and gradient in LeastSquaresGradient
compared with R
In most of the academic paper and algorithm implementations,
people use L = 1/2n ||A weights-y||^2
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-67694284
@avulanov I don't check your implementation yet, but I'm ready to have the
optimized MLOR for you to test. Can you try the `LogisticGradient` in
https://
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-67716565
@avulanov PS, you can just replace the gradient function without doing any
change. Let me know how much performance gain you see, and I'm very interested
in this. T
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-67718128
Yes, `foreachActive` is the new API in Spark 1.2.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/1379#issuecomment-67720689
@avulanov The new branch is not finished yet. You need to rebase
https://github.com/dbtsai/spark/tree/dbtsai-mlor to master, and just replace
the gradient function
401 - 500 of 1803 matches
Mail list logo