[GitHub] spark issue #21848: [SPARK-24890] [SQL] Short circuiting the `if` condition ...

2018-07-24 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/21848 @kiszk `trait Stateful extends Nondeterministic`, and this rule will not be invoked when an expression is nondeterministic

[GitHub] spark issue #21864: [SPARK-24908][R][style] removing spaces to make lintr ha...

2018-07-24 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/21864 LGTM. Merged into master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21848: [SPARK-24890] [SQL] Short circuiting the `if` condition ...

2018-07-24 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/21848 Here is a followup PR for making `AssertTrue` and `AssertNotNull` `non-deterministic` https://issues.apache.org/jira/browse/SPARK-24913

[GitHub] spark pull request #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` w...

2018-07-24 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21850#discussion_r204953202 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -414,6 +414,16 @@ object SimplifyConditionals

[GitHub] spark pull request #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` w...

2018-07-24 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21850#discussion_r204953356 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -414,6 +414,16 @@ object SimplifyConditionals

[GitHub] spark issue #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when the...

2018-07-24 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/21850 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` w...

2018-07-25 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21850#discussion_r205187664 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -414,6 +414,16 @@ object SimplifyConditionals

[GitHub] spark issue #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when the...

2018-07-25 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/21850 @gatorsmile All the new rules added into `If` should always have `CaseWhen` version. But there will be time that we only add `If` version, or it only makes sense to have `If` version

[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-25 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21852#discussion_r205305691 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyConditionalSuite.scala --- @@ -122,4 +126,25 @@ class

[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-25 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21852#discussion_r205306098 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -416,6 +416,22 @@ object SimplifyConditionals

[GitHub] spark issue #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO support shou...

2018-07-26 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/21847 +cc @MaxGekk and @gengliangwang who worked on this part of codebase. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` w...

2018-07-26 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21850#discussion_r205556780 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -414,6 +414,9 @@ object SimplifyConditionals extends

[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-26 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21852#discussion_r205599224 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -416,6 +416,29 @@ object SimplifyConditionals

[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...

2018-07-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21847#discussion_r205648911 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala --- @@ -87,17 +88,33 @@ class AvroSerializer(rootCatalystType

[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...

2018-07-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21847#discussion_r205685728 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala --- @@ -165,16 +183,112 @@ class AvroSerializer(rootCatalystType

[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...

2018-07-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21847#discussion_r205683257 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala --- @@ -148,7 +165,8 @@ class AvroSerializer(rootCatalystType

[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...

2018-07-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21847#discussion_r205684257 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala --- @@ -165,16 +183,112 @@ class AvroSerializer(rootCatalystType

[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...

2018-07-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21847#discussion_r205692778 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala --- @@ -165,16 +183,112 @@ class AvroSerializer(rootCatalystType

[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...

2018-07-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21847#discussion_r205692946 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala --- @@ -165,16 +183,112 @@ class AvroSerializer(rootCatalystType

[GitHub] spark pull request #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` w...

2018-07-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21850#discussion_r205830257 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -414,6 +414,9 @@ object SimplifyConditionals extends

[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...

2018-07-27 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/21852 +cc @cloud-fan and @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #21904: [SPARK-24953] [SQL] Prune a branch in `CaseWhen` ...

2018-07-27 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/21904 [SPARK-24953] [SQL] Prune a branch in `CaseWhen` if previously seen ## What changes were proposed in this pull request? If a condition in a branch is previously seen, this branch can be

[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-28 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21852#discussion_r205946975 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -416,6 +416,23 @@ object SimplifyConditionals

[GitHub] spark pull request #21904: [SPARK-24953] [SQL] Prune a branch in `CaseWhen` ...

2018-07-29 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21904#discussion_r205963712 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -416,6 +416,29 @@ object SimplifyConditionals

[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-30 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21852#discussion_r206266243 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -416,6 +416,23 @@ object SimplifyConditionals

[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-30 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21852#discussion_r206271589 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -416,6 +416,23 @@ object SimplifyConditionals

[GitHub] spark pull request #21904: [SPARK-24953] [SQL] Prune a branch in `CaseWhen` ...

2018-07-30 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21904#discussion_r206333426 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -416,6 +450,12 @@ object SimplifyConditionals

[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...

2018-07-30 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21847#discussion_r206350423 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala --- @@ -87,17 +87,30 @@ class AvroSerializer(rootCatalystType

[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...

2018-07-30 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21847#discussion_r206353416 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala --- @@ -120,7 +133,7 @@ class AvroSerializer(rootCatalystType

[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...

2018-07-30 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21847#discussion_r206356380 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala --- @@ -146,9 +159,13 @@ class AvroSerializer(rootCatalystType

[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...

2018-07-30 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21847#discussion_r206356838 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala --- @@ -146,9 +159,13 @@ class AvroSerializer(rootCatalystType

[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...

2018-07-30 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21847#discussion_r206358703 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala --- @@ -165,16 +182,118 @@ class AvroSerializer(rootCatalystType

[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...

2018-07-30 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21847#discussion_r206359706 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala --- @@ -165,16 +182,118 @@ class AvroSerializer(rootCatalystType

[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-31 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21852#discussion_r206695251 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -416,6 +416,21 @@ object SimplifyConditionals

[GitHub] spark pull request #21952: [SPARK-24993] [SQL] [WIP] Make Avro Fast Again

2018-08-01 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/21952 [SPARK-24993] [SQL] [WIP] Make Avro Fast Again ## What changes were proposed in this pull request? When @lindblombr developed [SPARK-24855](https://github.com/apache/spark/pull/21847) to

[GitHub] spark issue #21952: [SPARK-24993] [SQL] [WIP] Make Avro Fast Again

2018-08-02 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/21952 @viirya How did you run the benchmark? I tried again on my desktop, and still got consistent regression. Thanks. Spark 2.4 ``` spark git:(master) ./build/mvn -DskipTests clean

[GitHub] spark issue #21952: [SPARK-24993] [SQL] [WIP] Make Avro Fast Again

2018-08-02 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/21952 @cloud-fan as you suggested, I benchmarked cache read performance, and the performance is the same. This makes sense, since it's unlikely that cache read performance is that bad so we can se

[GitHub] spark issue #21495: [SPARK-24418][Build] Upgrade Scala to 2.11.12 and 2.12.6

2018-06-27 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/21495 I was on a family leave for couple weeks. Thank you all for helping out and merging it. The only change with this PR is that the welcome message will be printed first, and then the Spark

[GitHub] spark issue #21692: [SPARK-24715][Build] Override jline version as 2.14.3 in...

2018-07-02 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/21692 @viirya thanks for this PR. I thought SBT always uses pom for dependencies, and I wonder why there is a discrepancy so we need to manually override it

[GitHub] spark issue #21459: [SPARK-24420][Build] Upgrade ASM to 6.1 to support JDK9+

2018-07-02 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/21459 There are three approvals from the committers, and the changes are pretty trivial to revert if we see any performance regression which is unlikely. To move thing forward, if there is no further

[GitHub] spark issue #21459: [SPARK-24420][Build] Upgrade ASM to 6.1 to support JDK9+

2018-07-03 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/21459 Thanks. Merged into master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request: [SPARK-5253] [ML] LinearRegression with L1/L2 ...

2015-04-24 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/4259#discussion_r29098013 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala --- @@ -256,4 +256,38 @@ trait HasFitIntercept extends Params

[GitHub] spark pull request: [SPARK-5253] [ML] LinearRegression with L1/L2 ...

2015-04-24 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/4259#discussion_r29098031 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -42,34 +50,122 @@ private[regression] trait LinearRegressionParams

[GitHub] spark pull request: [SPARK-5253] [ML] LinearRegression with L1/L2 ...

2015-04-25 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/4259#discussion_r29098568 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -42,34 +50,122 @@ private[regression] trait LinearRegressionParams

[GitHub] spark pull request: [MLLib]SPARK-5027:add SVMWithLBFGS interface i...

2015-01-12 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/3890#issuecomment-69624254 @loachli OWLQN doesn't automatically solve the issue of non-differentiability. As a result, you have to remove the L1 term from HingeGradient, and use the Breeze&#x

[GitHub] spark pull request: [SPARK-2309][MLlib] Multinomial Logistic Regre...

2015-01-14 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/3833#issuecomment-69889747 Jenkins, please re-test again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-2309][MLlib] Multinomial Logistic Regre...

2015-01-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/3833#discussion_r22963566 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -18,30 +18,36 @@ package

[GitHub] spark pull request: [SPARK-2309][MLlib] Multinomial Logistic Regre...

2015-01-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/3833#discussion_r22963904 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -18,30 +18,36 @@ package

[GitHub] spark pull request: [SPARK-2309][MLlib] Multinomial Logistic Regre...

2015-01-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/3833#discussion_r22965406 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -61,20 +67,70 @@ class LogisticRegressionModel

[GitHub] spark pull request: [SPARK-2309][MLlib] Multinomial Logistic Regre...

2015-01-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/3833#discussion_r22967437 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala --- @@ -61,20 +67,70 @@ class LogisticRegressionModel

[GitHub] spark pull request: [SPARK-2309][MLlib] Multinomial Logistic Regre...

2015-01-14 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/3833#issuecomment-70007063 Jenkins, please re-test again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-2309][MLlib] Multinomial Logistic Regre...

2015-01-21 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/3833#issuecomment-70936401 Ping @mengxr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-5207] [MLLIB] StandardScalerModel mean ...

2015-01-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/4140#discussion_r23485163 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -61,20 +61,30 @@ class StandardScaler(withMean: Boolean, withStd

[GitHub] spark pull request: [SPARK-5207] [MLLIB] StandardScalerModel mean ...

2015-01-23 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/4140#discussion_r23486231 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -61,20 +61,30 @@ class StandardScaler(withMean: Boolean, withStd

[GitHub] spark pull request: [SPARK-5207] [MLLIB] StandardScalerModel mean ...

2015-01-23 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/4140#issuecomment-71281849 For the unit-test part, is it possible not to change too much? Also, it will be easier to debug if the assertion is in the test instead of abstract out. For example

[GitHub] spark pull request: [SPARK-5207] [MLLIB] StandardScalerModel mean ...

2015-01-26 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/4140#discussion_r23576821 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -61,20 +61,34 @@ class StandardScaler(withMean: Boolean, withStd

[GitHub] spark pull request: [SPARK-5207] [MLLIB] StandardScalerModel mean ...

2015-01-26 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/4140#discussion_r23576935 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -61,20 +61,34 @@ class StandardScaler(withMean: Boolean, withStd

[GitHub] spark pull request: [SPARK-5207] [MLLIB] StandardScalerModel mean ...

2015-01-26 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/4140#discussion_r23577023 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -61,20 +61,34 @@ class StandardScaler(withMean: Boolean, withStd

[GitHub] spark pull request: [SPARK-5207] [MLLIB] StandardScalerModel mean ...

2015-01-26 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/4140#issuecomment-71566737 LGTM except those two minor details. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-5207] [MLLIB] StandardScalerModel mean ...

2015-01-26 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/4140#discussion_r23580058 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -61,20 +61,34 @@ class StandardScaler(withMean: Boolean, withStd

[GitHub] spark pull request: [SPARK-5207] [MLLIB] StandardScalerModel mean ...

2015-01-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/4140#discussion_r23660759 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/StandardScaler.scala --- @@ -61,20 +61,34 @@ class StandardScaler(withMean: Boolean, withStd

[GitHub] spark pull request: [SPARK-2309][MLlib] Multinomial Logistic Regre...

2015-01-28 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/3833#discussion_r23743161 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala --- @@ -55,24 +57,79 @@ abstract class Gradient extends Serializable

[GitHub] spark pull request: [SPARK-5253] [ML] LinearRegression with L1/L2 ...

2015-01-28 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/4259 [SPARK-5253] [ML] LinearRegression with L1/L2 (ElasticNet) using OWLQN You can merge this pull request into a Git repository by running: $ git pull https://github.com/AlpineNow/spark lir

[GitHub] spark pull request: [SPARK-2309][MLlib] Multinomial Logistic Regre...

2015-01-28 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/3833#discussion_r23743197 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala --- @@ -55,24 +57,79 @@ abstract class Gradient extends Serializable

[GitHub] spark pull request: [SPARK-2309][MLlib] Multinomial Logistic Regre...

2015-01-29 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/3833#discussion_r23823903 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala --- @@ -55,6 +56,97 @@ object LogisticRegressionSuite

[GitHub] spark pull request: [SPARK-2309][MLlib] Multinomial Logistic Regre...

2015-01-29 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/3833#discussion_r23823961 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala --- @@ -285,6 +377,97 @@ class LogisticRegressionSuite

[GitHub] spark pull request: [SPARK-1892][MLLIB] Adding OWL-QN optimizer fo...

2014-10-01 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/840#issuecomment-57439459 @debasish83 and @codedeft The weighted method for OWLQN in breeze is merged https://github.com/scalanlp/breeze/commit/2570911026aa05aa1908ccf7370bc19cd8808a4c I

[GitHub] spark pull request: [SPARK-3119] Re-implementation of TorrentBroad...

2014-10-07 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/2030#issuecomment-58183559 We had a build against the spark master on Oct 2, and when ran our application with data around 600GB, we got the following exception. Does this PR fix this issue which

[GitHub] spark pull request: [SPARK-3832][MLlib] Upgrade Breeze dependency ...

2014-10-07 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/2693 [SPARK-3832][MLlib] Upgrade Breeze dependency to 0.10 In Breeze 0.10, the L1regParam can be configured through anonymous function in OWLQN, and each component can be penalized differently. This is

[GitHub] spark pull request: [SPARK-3119] Re-implementation of TorrentBroad...

2014-10-07 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/2030#issuecomment-58214186 I thought it was a close issue, so I moved my comment to JIRA. I ran into this issue in spark-shell not the standalone application, does SPARK-3762 apply in this

[GitHub] spark pull request: [SPARK-3832][MLlib] Upgrade Breeze dependency ...

2014-10-07 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/2693#issuecomment-58276308 @dlwh David, do you know if there is dependency change in breeze-0.10 and is it compatible with both scala 2.10 and 2.11? Thanks. --- If your project is set up for it

[GitHub] spark pull request: Minor change in the comment of spark-defaults....

2014-10-08 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/2709 Minor change in the comment of spark-defaults.conf.template spark-defaults.conf is used in spark-shell as well, and this PR added this into the comment. You can merge this pull request into a Git

[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...

2014-10-08 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58361701 Jenkins, please start the test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3856][MLLIB] use norm operator after br...

2014-10-08 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/2718#issuecomment-58435304 LGTM Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...

2014-10-10 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58629065 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...

2014-10-10 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58732030 It's failing at FlumeStreamSuite.scala:109 which seems to be unrelated to this patch. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: Minor change in the comment of spark-defaults....

2014-10-19 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/2709#issuecomment-59667207 @andrewor14 Sorry for late reply since I was on vacation in Europe last week. I can continue work on this after I finish my talk in IOTA conf tomorrow. --- If your

[GitHub] spark pull request: [SPARK-5802][MLLIB] cache transformed data in ...

2015-02-17 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/4593#issuecomment-74805610 Sorry for the late reply since I'm traveling recently. My concern is that will this cause "caching twice" in the new ML api? For example, in ml

[GitHub] spark pull request: [SPARK-5537][MLib] Expand user guide for multi...

2015-02-26 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/4801 [SPARK-5537][MLib] Expand user guide for multinomial logistic regression You can merge this pull request into a Git repository by running: $ git pull https://github.com/AlpineNow/spark mlor

[GitHub] spark pull request: [SPARK-5537] Add user guide for multinomial lo...

2015-03-02 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/4861#discussion_r25656411 --- Diff: docs/mllib-linear-methods.md --- @@ -144,41 +152,7 @@ denoted by $\x$, the model makes predictions based on the value of $\wv^T \x$. By the

[GitHub] spark pull request: [SPARK-5537] Add user guide for multinomial lo...

2015-03-02 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/4866 [SPARK-5537] Add user guide for multinomial logistic regression Adding more description on top of #4861. You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] spark pull request: [SPARK-5253] [ML] LinearRegression with L1/L2 ...

2015-03-02 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/4259#issuecomment-76880508 @jkbradley I will rebase soon. @debasish83 I'll add MLOR with elastic-net when we stabilize the new ML api. Doing this in old codebase will be huge effort, and I

[GitHub] spark pull request: [SPARK-6141][MLlib] Upgrade Breeze from 0.10 t...

2015-03-03 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/4879 [SPARK-6141][MLlib] Upgrade Breeze from 0.10 to 0.11 to fix convergence bug LBFGS and OWLQN in Breeze 0.10 has convergence check bug. This is fixed in 0.11, see the description in Breeze project

[GitHub] spark pull request: [SPARK-6141][MLlib] Upgrade Breeze from 0.10 t...

2015-03-03 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/4879#issuecomment-77044998 @coderxiang Breeze seems to accidentally remove the public constructor of CSCMatrix, and we have a PR to Breeze to address it. Let's see if we can make it. --- If

[GitHub] spark pull request: [SPARK-6141][MLlib] Upgrade Breeze from 0.10 t...

2015-03-03 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/4879#issuecomment-77054801 This is the fix in breeze side for missing public constructor of CSCMatrix https://github.com/scalanlp/breeze/pull/375 --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-5253] [ML] LinearRegression with L1/L2 ...

2015-03-23 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/4259#issuecomment-85275887 @jkbradley and @mengxr I just rebased it. Will do couple optimizations to avoid the scaling on the datasets which can be done in the optimization instead. You guys can

[GitHub] spark pull request: [SPARK-5253] [ML] LinearRegression with L1/L2 ...

2015-03-25 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/4259#discussion_r27180837 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/sharedParams.scala --- @@ -34,6 +34,43 @@ private[ml] trait HasRegParam extends Params { def

[GitHub] spark pull request: [SPARK-5253] [ML] LinearRegression with L1/L2 ...

2015-03-26 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/4259#issuecomment-86731157 @jkbradley I think we should only support basic regularization in spark.ml first which is what python scikit-learn does. If users have the need of different type of

[GitHub] spark pull request: [SPARK-5253] [ML] LinearRegression with L1/L2 ...

2015-03-26 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/4259#discussion_r27260459 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/sharedParams.scala --- @@ -34,6 +34,43 @@ private[ml] trait HasRegParam extends Params { def

[GitHub] spark pull request: [SPARK-5253] [ML] LinearRegression with L1/L2 ...

2015-03-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/4259#discussion_r27333012 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/sharedParams.scala --- @@ -34,6 +34,43 @@ private[ml] trait HasRegParam extends Params { def

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-08 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-66192930 @avulanov I did couple performance turning in the MLOR gradient calculation in my company's proprietary implementation which results 4x faster than the open source o

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-09 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-66336110 @avulanov 1. I did the same optimization for MLlib in [my recently PRs](https://github.com/apache/spark/commits/master?author=dbtsai). * Accessing the

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-10 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-66513731 @avulanov I remembered CJ Lin said he posted the 600GB dataset on his website. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-4887][MLlib] Fix a bad unittest in Logi...

2014-12-18 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/3735 [SPARK-4887][MLlib] Fix a bad unittest in LogisticRegressionSuite The original test doesn't make sense since if you step in, the lossSum is already NaN, and the coefficients are dive

[GitHub] spark pull request: [SPARK-4887][MLlib] Fix a bad unittest in Logi...

2014-12-18 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/3735#issuecomment-67562831 I agree. The test is not good. I'm thinking we probably can add couple well known dataset like iris or prostate cancer dataset into the test resource, and we can co

[GitHub] spark pull request: [SPARK-4907][MLlib] Inconsistent loss and grad...

2014-12-19 Thread dbtsai
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/3746 [SPARK-4907][MLlib] Inconsistent loss and gradient in LeastSquaresGradient compared with R In most of the academic paper and algorithm implementations, people use L = 1/2n ||A weights-y||^2

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-19 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-67694284 @avulanov I don't check your implementation yet, but I'm ready to have the optimized MLOR for you to test. Can you try the `LogisticGradient` in https://

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-19 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-67716565 @avulanov PS, you can just replace the gradient function without doing any change. Let me know how much performance gain you see, and I'm very interested in this. T

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-19 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-67718128 Yes, `foreachActive` is the new API in Spark 1.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...

2014-12-19 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-67720689 @avulanov The new branch is not finished yet. You need to rebase https://github.com/dbtsai/spark/tree/dbtsai-mlor to master, and just replace the gradient function

<    1   2   3   4   5   6   7   8   9   10   >