[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

2017-07-19 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16630#discussion_r128200260 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -1458,4 +1475,167 @@ class

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

2017-07-19 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16630#discussion_r128202464 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -1458,4 +1475,167 @@ class

[GitHub] spark issue #18605: [SparkR][SPARK-21381]:SparkR: pass on setHandleInvalid f...

2017-07-17 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18605 @wangmiao1981 This is expected, see my comment [here](https://github.com/apache/spark/pull/18613#discussion_r127011929) . This uncovers an existing bug for ```forceIndexLabel```. I will send a

[GitHub] spark pull request #18612: [SPARK-21388][ML][PySpark] GBTs inherit from HasS...

2017-07-15 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/18612#discussion_r127584690 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala --- @@ -162,7 +162,7 @@ private[ml] trait HasThreshold extends Params

[GitHub] spark issue #18613: [SPARK-20307][ML][SPARKR][FOLLOW-UP] RFormula should han...

2017-07-15 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18613 Merged into master. Thanks for all reviewing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #18613: [SPARK-20307][ML][SPARKR][FOLLOW-UP] RFormula should han...

2017-07-13 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18613 @felixcheung We don't silently drop features, we use ```handleInvalid``` to let users decide how to handle invalid features or label. The behavior is consistent with Scala which suppor

[GitHub] spark pull request #18613: [SPARK-20307][ML][SPARKR][FOLLOW-UP] RFormula sho...

2017-07-13 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/18613#discussion_r127260481 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/RFormulaSuite.scala --- @@ -501,4 +501,51 @@ class RFormulaSuite extends SparkFunSuite with

[GitHub] spark issue #18613: [SPARK-20307][ML][SPARKR][FOLLOW-UP] RFormula should han...

2017-07-13 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18613 @felixcheung @wangmiao1981 In Scala, we set ```handleInvalid``` for both estimator and model, although it only takes effect for model prediction. The reason behind this is we should support

[GitHub] spark issue #18610: [SPARK-21386] ML LinearRegression supports warm start fr...

2017-07-12 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18610 This failure is irrelevant. Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #18613: [SPARK-20307][ML][SPARKR][FOLLOW-UP] RFormula sho...

2017-07-12 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/18613#discussion_r127011929 --- Diff: R/pkg/tests/fulltests/test_mllib_tree.R --- @@ -225,7 +225,7 @@ test_that("spark.randomForest", { expect_error(collect(p

[GitHub] spark issue #18613: [SPARK-20307][ML][SPARKR][FOLLOW-UP] RFormula should han...

2017-07-12 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18613 cc @felixcheung @wangmiao1981 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18610: [SPARK-21386] ML LinearRegression supports warm start fr...

2017-07-12 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18610 cc @dbtsai @sethah @MLnick @hhbyyh --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #18613: [SPARK-20307][ML][SPARKR][FOLLOW-UP] RFormula sho...

2017-07-12 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/18613 [SPARK-20307][ML][SPARKR][FOLLOW-UP] RFormula should handle invalid for both features and label column. ## What changes were proposed in this pull request? ```RFormula``` should handle

[GitHub] spark issue #18582: [SPARK-18619][ML] Make QuantileDiscretizer/Bucketizer/St...

2017-07-12 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18582 LGTM, merged into master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #18610: [SPARK-21386] ML LinearRegression supports warm s...

2017-07-12 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/18610#discussion_r126911229 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/DefaultReadWriteTest.scala --- @@ -113,7 +115,14 @@ trait DefaultReadWriteTest extends

[GitHub] spark pull request #17117: [SPARK-10780][ML] Support initial model for KMean...

2017-07-12 Thread yanboliang
Github user yanboliang closed the pull request at: https://github.com/apache/spark/pull/17117 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #18610: [SPARK-21386] ML LinearRegression supports warm s...

2017-07-12 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/18610 [SPARK-21386] ML LinearRegression supports warm start from user provided initial model. ## What changes were proposed in this pull request? Allow users to set initial model when training

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

2017-07-11 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16630#discussion_r126872238 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -1441,4 +1460,33 @@ class

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

2017-07-11 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16630#discussion_r126871919 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -452,6 +452,8 @@ object

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

2017-07-11 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16630#discussion_r126873611 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -1441,4 +1460,33 @@ class

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

2017-07-11 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16630#discussion_r126873706 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -1187,6 +1189,23 @@ class

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

2017-07-11 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16630#discussion_r126872077 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -1441,4 +1460,33 @@ class

[GitHub] spark pull request #18554: [SPARK-21306][ML] OneVsRest should cache weightCo...

2017-07-11 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/18554#discussion_r126868713 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -317,7 +318,12 @@ final class OneVsRest @Since("

[GitHub] spark pull request #18554: [SPARK-21306][ML] OneVsRest should cache weightCo...

2017-07-11 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/18554#discussion_r126710934 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -317,7 +318,12 @@ final class OneVsRest @Since("

[GitHub] spark pull request #18582: [SPARK-18619][ML] Make QuantileDiscretizer/Bucket...

2017-07-11 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/18582#discussion_r126679441 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -86,15 +87,11 @@ final class Bucketizer @Since("1.4.0"

[GitHub] spark pull request #18582: [SPARK-18619][ML] Make QuantileDiscretizer/Bucket...

2017-07-11 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/18582#discussion_r126689856 --- Diff: python/pyspark/ml/feature.py --- @@ -3058,26 +3035,37 @@ class RFormula(JavaEstimator, HasFeaturesCol, HasLabelCol, JavaMLReadable, JavaM

[GitHub] spark pull request #18582: [SPARK-18619][ML] Make QuantileDiscretizer/Bucket...

2017-07-11 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/18582#discussion_r126682178 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -460,16 +460,16 @@ object LinearRegression extends

[GitHub] spark pull request #18582: [SPARK-18619][ML] Make QuantileDiscretizer/Bucket...

2017-07-11 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/18582#discussion_r126681028 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala --- @@ -74,16 +74,12 @@ private[feature] trait

[GitHub] spark pull request #18582: [SPARK-18619][ML] Make QuantileDiscretizer/Bucket...

2017-07-11 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/18582#discussion_r126679343 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -36,7 +36,8 @@ import org.apache.spark.sql.types.{DoubleType

[GitHub] spark issue #18305: [SPARK-20988][ML] Logistic regression uses aggregator hi...

2017-07-10 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18305 @MLnick I remembered a bug I hit several months ago: we forgot to destroy a broadcast variable in source code, but it throws exception after we add destroy explicitly. This is because we put

[GitHub] spark pull request #18305: [SPARK-20988][ML] Logistic regression uses aggreg...

2017-07-10 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/18305#discussion_r126582816 --- Diff: mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/LogisticAggregatorSuite.scala --- @@ -0,0 +1,254 @@ +/* + * Licensed to the

[GitHub] spark issue #18523: [SPARK-21285][ML] VectorAssembler reports the column nam...

2017-07-07 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18523 Merged into master, thanks for all! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18523: [SPARK-21285][ML] VectorAssembler reports the column nam...

2017-07-06 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18523 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #18523: [SPARK-21285][ML] VectorAssembler reports the col...

2017-07-06 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/18523#discussion_r125876862 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -113,12 +113,15 @@ class VectorAssembler @Since("

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-07-06 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/17862 I'm in favor of discarding OWLQN. Take LiR or LoR as examples, if you replace LBFGS with OWLQN for regression with L2 regularization, we can saw OWLQN may converge faster than LBFGS in a ce

[GitHub] spark issue #18523: [SPARK-21285][ML] VectorAssembler reports the column nam...

2017-07-05 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18523 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18523: [SPARK-21285][ML] VectorAssembler reports the column nam...

2017-07-05 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18523 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #18523: [SPARK-21285][ML] VectorAssembler reports the col...

2017-07-05 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/18523#discussion_r125804032 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -113,12 +113,15 @@ class VectorAssembler @Since("

[GitHub] spark pull request #18305: [SPARK-20988][ML] Logistic regression uses aggreg...

2017-07-05 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/18305#discussion_r125671626 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala --- @@ -32,40 +34,45 @@ private[ml] trait

[GitHub] spark pull request #18305: [SPARK-20988][ML] Logistic regression uses aggreg...

2017-07-05 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/18305#discussion_r125675084 --- Diff: mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/LogisticAggregatorSuite.scala --- @@ -0,0 +1,254 @@ +/* + * Licensed to the

[GitHub] spark pull request #18305: [SPARK-20988][ML] Logistic regression uses aggreg...

2017-07-05 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/18305#discussion_r125675879 --- Diff: mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/LogisticAggregatorSuite.scala --- @@ -0,0 +1,254 @@ +/* + * Licensed to the

[GitHub] spark pull request #18523: [SPARK-21285][ML] VectorAssembler reports the col...

2017-07-05 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/18523#discussion_r125652869 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala --- @@ -113,12 +113,15 @@ class VectorAssembler @Since("

[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...

2017-07-05 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r125606256 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -136,6 +137,14 @@ private[ml] object Param { s

[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...

2017-07-05 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r125564145 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala --- @@ -187,12 +188,12 @@ class NaiveBayes @Since("

[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...

2017-07-05 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r125606191 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -229,6 +238,16 @@ object ParamValidators { def arrayLengthGt[T

[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...

2017-07-05 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r125642693 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -93,8 +93,8 @@ private[classification] trait

[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...

2017-07-05 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r125647745 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -199,7 +199,7 @@ private[regression] trait

[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...

2017-07-05 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r125648566 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -460,7 +463,7 @@ object LinearRegression extends

[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...

2017-07-05 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r125605206 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala --- @@ -83,19 +85,16 @@ class

[GitHub] spark pull request #17995: [SPARK-20762][ML]Make String Params Case-Insensit...

2017-07-05 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17995#discussion_r125605561 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala --- @@ -106,4 +105,13 @@ object

[GitHub] spark issue #18534: [SPARK-21310][ML][PySpark] Expose offset in PySpark

2017-07-05 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18534 LGTM, merged into master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18453: [SPARK-19852][PYSPARK][ML] Python StringIndexer supports...

2017-07-02 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18453 Merged into master. Thanks for all you reviews. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #18453: [SPARK-19852][PYSPARK][ML] Python StringIndexer s...

2017-07-02 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/18453#discussion_r125176981 --- Diff: python/pyspark/ml/feature.py --- @@ -2132,6 +2132,12 @@ class StringIndexer(JavaEstimator, HasInputCol, HasOutputCol, HasHandleInvalid

[GitHub] spark pull request #18453: [SPARK-19852][PYSPARK][ML] Python StringIndexer s...

2017-07-01 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/18453#discussion_r125159512 --- Diff: python/pyspark/ml/feature.py --- @@ -2132,6 +2132,12 @@ class StringIndexer(JavaEstimator, HasInputCol, HasOutputCol, HasHandleInvalid

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-07-01 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/17862 @hhbyyh Make sense, does it mean both LBFGS and OWLQN produce the same solution if fitting without intercept? If so, I'm prefer to change the solver to LBFGS rather than adding a new o

[GitHub] spark pull request #16028: [SPARK-18518][ML] HasSolver supports override

2017-07-01 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16028#discussion_r125157589 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -53,7 +53,23 @@ import org.apache.spark.storage.StorageLevel

[GitHub] spark pull request #16028: [SPARK-18518][ML] HasSolver supports override

2017-07-01 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16028#discussion_r125157587 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -143,7 +143,18 @@ private[regression] trait

[GitHub] spark pull request #16028: [SPARK-18518][ML] HasSolver supports override

2017-07-01 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16028#discussion_r125157502 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -421,6 +435,18 @@ object LinearRegression extends

[GitHub] spark issue #18495: [SPARK-21275][ML] Update GLM test to use supportedFamily...

2017-06-30 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18495 LGTM, merged into master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #16028: [SPARK-18518][ML] HasSolver supports override

2017-06-30 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16028#discussion_r125155063 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -53,7 +53,23 @@ import org.apache.spark.storage.StorageLevel

[GitHub] spark pull request #16028: [SPARK-18518][ML] HasSolver supports override

2017-06-30 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16028#discussion_r125155049 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifier.scala --- @@ -75,17 +78,13 @@ private[classification

[GitHub] spark issue #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-30 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/16699 #18489 fixed the build failure. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18489: [ML] Fix scala-2.10 build failure of GeneralizedLinearRe...

2017-06-30 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/18489 Merged into master to fix scala-2.10 build failure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #18489: [ML] Fix scala-2.10 build failure of GeneralizedL...

2017-06-30 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/18489 [ML] Fix scala-2.10 build failure of GeneralizedLinearRegressionSuite. ## What changes were proposed in this pull request? Fix scala-2.10 build failure of ```GeneralizedLinearRegressionSuite

[GitHub] spark issue #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-30 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/16699 @hvanhovell I will send a quick fix soon, thanks for your kindly remind. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-30 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/16699 Merged into master. Thanks for contribution and all reviews! This great feature will benefit lots of users. @actuaryzhang Could you send follow-up PRs to address the two inline comments

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-06-29 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/17862 @hhbyyh If different handling of intercept scaling is the major cause for result difference between sklearn and Spark, do you check whether fit model without intercept will produce same model

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-29 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r124737247 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -303,6 +327,16 @@ class

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-29 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r124739973 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -944,15 +984,22 @@ class

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-29 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r124733535 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Instance.scala --- @@ -27,3 +27,29 @@ import org.apache.spark.ml.linalg.Vector * @param

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-29 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r124737015 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -134,6 +134,25 @@ private[regression] trait

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-29 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r124733379 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Instance.scala --- @@ -27,3 +27,29 @@ import org.apache.spark.ml.linalg.Vector * @param

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-29 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r124737056 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -134,6 +134,25 @@ private[regression] trait

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-29 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r124753530 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -961,14 +1008,16 @@ class

[GitHub] spark pull request #16028: [SPARK-18518][ML] HasSolver supports override

2017-06-29 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16028#discussion_r124730326 --- Diff: python/pyspark/ml/classification.py --- @@ -1327,8 +1327,6 @@ class MultilayerPerceptronClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol

[GitHub] spark issue #12414: [SPARK-14657][SPARKR][ML] RFormula w/o intercept should ...

2017-06-28 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/12414 Merged into master. Thanks for all your review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-28 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/16699 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #12414: [SPARK-14657][SPARKR][ML] RFormula w/o intercept ...

2017-06-28 Thread yanboliang
GitHub user yanboliang reopened a pull request: https://github.com/apache/spark/pull/12414 [SPARK-14657][SPARKR][ML] RFormula w/o intercept should output reference category when encoding string terms ## What changes were proposed in this pull request? Please see [SPARK

[GitHub] spark pull request #12414: [SPARK-14657][SPARKR][ML] RFormula w/o intercept ...

2017-06-28 Thread yanboliang
Github user yanboliang closed the pull request at: https://github.com/apache/spark/pull/12414 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #12414: [SPARK-14657][SPARKR][ML] RFormula w/o intercept should ...

2017-06-28 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/12414 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #18453: [SPARK-19852][PYSPARK][ML] Python StringIndexer s...

2017-06-28 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/18453 [SPARK-19852][PYSPARK][ML] Python StringIndexer supports 'keep' to handle invalid data ## What changes were proposed in this pull request? This PR is to maintain API parity wi

[GitHub] spark issue #12414: [SPARK-14657][SPARKR][ML] RFormula w/o intercept should ...

2017-06-28 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/12414 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #12414: [SPARK-14657][SPARKR][ML] RFormula w/o intercept should ...

2017-06-27 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/12414 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-27 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r124281406 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -961,14 +1007,30 @@ class

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-27 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r124235464 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -303,6 +317,17 @@ class

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-27 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r124272947 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala --- @@ -798,77 +798,184 @@ class

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-27 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r124271108 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala --- @@ -578,6 +578,79 @@ class

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-27 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r124261785 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -339,15 +364,16 @@ class

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-27 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r124258933 --- Diff: mllib/src/test/scala/org/apache/spark/ml/optim/IterativelyReweightedLeastSquaresSuite.scala --- @@ -43,7 +43,7 @@ class

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-27 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r124268761 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala --- @@ -798,77 +798,160 @@ class

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-27 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r124233224 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -406,6 +437,14 @@ object

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-27 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r124269044 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala --- @@ -798,77 +798,184 @@ class

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-27 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r124212354 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Instance.scala --- @@ -27,3 +27,28 @@ import org.apache.spark.ml.linalg.Vector * @param

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-27 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r124236849 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -440,13 +479,13 @@ object

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-27 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r124234628 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -168,6 +179,9 @@ private[regression] trait

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-27 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r124262306 --- Diff: mllib/src/test/scala/org/apache/spark/ml/optim/IterativelyReweightedLeastSquaresSuite.scala --- @@ -156,7 +156,7 @@ class

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-27 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r124265572 --- Diff: mllib/src/test/scala/org/apache/spark/ml/optim/IterativelyReweightedLeastSquaresSuite.scala --- @@ -169,29 +169,29 @@ class

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-27 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r124272854 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala --- @@ -798,77 +798,184 @@ class

[GitHub] spark pull request #16699: [SPARK-18710][ML] Add offset in GLM

2017-06-27 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16699#discussion_r124237318 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala --- @@ -944,15 +983,22 @@ class

[GitHub] spark issue #12414: [SPARK-14657][SPARKR][ML] RFormula w/o intercept should ...

2017-06-27 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/12414 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

<    1   2   3   4   5   6   7   8   9   10   >