Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-178781745
**[Test build #50575 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50575/consoleFull)**
for PR 10702 at commit
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-178787032
I meat comparing the result with your solution when `yStd != 0`, and
`regParm != 0`. I suspect that you will get different result since GLMNET one
forces to standardize
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-178782152
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user iyounus commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-178770437
GLMNET sets all coefficients to zero if yStd=0 and fitIntercept=false
regardless of standardization or regularization. Thats why I cannot compare my
normal equation
Github user iyounus commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51615248
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -398,7 +422,8 @@ class LinearRegressionModel private[ml] (
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-178782146
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-178757884
**[Test build #50575 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50575/consoleFull)**
for PR 10702 at commit
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-179000215
Yes, that's what I meat. Without standardizing the labels, no way to match
glmnet, but this makes the problem ill-defined when `yStd == 0`.
---
If your project is set
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r5162
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala
---
@@ -558,6 +583,86 @@ class LinearRegressionSuite
}
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-179002845
LGTM. Merged into master. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/10702
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51536905
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -74,7 +74,8 @@ class LinearRegression @Since("1.3.0")
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51536910
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -83,7 +84,8 @@ class LinearRegression @Since("1.3.0")
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51537328
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -398,7 +422,8 @@ class LinearRegressionModel private[ml] (
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-178441655
For the case (3), I agree with your agreement completely. Can you try your
normal equation solution with L2 without any standardization (nonzero ystd
data) and see if
Github user iyounus commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-178941008
For `yStd != 0`, and `regParm != 0`, my solution doesn't match with GLMNET.
I showed this comparison on this jira
https://github.com/apache/spark/pull/10274.
---
If
Github user iyounus commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-178134675
For the case (3), I'm assuming that the label and features are not
standardized. So, in that case, the solution exists. Here is my perspective on
this.
The
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-178281427
**[Test build #50512 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50512/consoleFull)**
for PR 10702 at commit
Github user iyounus commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51505471
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala
---
@@ -558,6 +575,47 @@ class LinearRegressionSuite
}
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-178297063
**[Test build #50512 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50512/consoleFull)**
for PR 10702 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-178297483
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-178297491
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-177337405
**[Test build #50450 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50450/consoleFull)**
for PR 10702 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-177343128
**[Test build #50450 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50450/consoleFull)**
for PR 10702 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-177343337
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-177343340
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user iyounus commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51355081
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala
---
@@ -558,6 +575,47 @@ class LinearRegressionSuite
}
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-177381602
**[Test build #50455 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50455/consoleFull)**
for PR 10702 at commit
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-177385145
Commenting on your issues.
Issue 1:
With `WeightedLeastSquares`, we have option to standardize the label and
features separately. As a result, if the label
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-177396200
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-177396130
**[Test build #50455 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50455/consoleFull)**
for PR 10702 at commit
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51354803
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala
---
@@ -558,6 +575,47 @@ class LinearRegressionSuite
}
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-177376070
LGTM except minor comments. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51354776
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -219,33 +219,44 @@ class LinearRegression @Since("1.3.0")
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51354767
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -219,33 +219,44 @@ class LinearRegression @Since("1.3.0")
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51354784
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala
---
@@ -558,6 +575,47 @@ class LinearRegressionSuite
}
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51355106
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala
---
@@ -558,6 +575,47 @@ class LinearRegressionSuite
}
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-177386126
**[Test build #50454 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50454/consoleFull)**
for PR 10702 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-177386204
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-177386203
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-177396199
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user iyounus commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-177375091
I've completed this PR. I think all the tests are there. Here, I'm going to
document a couple of minor issues just for future reference.
__Issue 1__
For
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-177376438
**[Test build #50454 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50454/consoleFull)**
for PR 10702 at commit
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51064724
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala
---
@@ -558,6 +575,47 @@ class LinearRegressionSuite
}
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51064615
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala
---
@@ -558,6 +575,47 @@ class LinearRegressionSuite
}
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51069962
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -219,33 +219,43 @@ class LinearRegression @Since("1.3.0")
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51070090
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -219,33 +219,43 @@ class LinearRegression @Since("1.3.0")
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51070105
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -219,33 +219,43 @@ class LinearRegression @Since("1.3.0")
Github user iyounus commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51071809
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -219,33 +219,43 @@ class LinearRegression @Since("1.3.0")
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-175901752
@iyounus `standardizeLabel = false/ture` with non-zero `regParam`, let's
throw the exception. I explained the mismatch against the analytic normal
equation in the other
Github user iyounus commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51071489
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -219,33 +219,43 @@ class LinearRegression @Since("1.3.0")
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51070506
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -219,33 +219,43 @@ class LinearRegression @Since("1.3.0")
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51077616
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -219,33 +219,43 @@ class LinearRegression @Since("1.3.0")
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51070301
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -219,33 +219,43 @@ class LinearRegression @Since("1.3.0")
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r51077516
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -219,33 +219,43 @@ class LinearRegression @Since("1.3.0")
Github user iyounus commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-173738254
I've added an exception for the case when label is constant and
`standardization == true` and `regParam != 0.0`. Also added test for this case.
I cannot test
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-173744471
**[Test build #49889 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49889/consoleFull)**
for PR 10702 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-173744642
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-173744640
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-173733630
**[Test build #49889 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49889/consoleFull)**
for PR 10702 at commit
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-173330038
@dbtsai Do you have time to make a pass?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user iyounus commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-173363841
@mengxr I haven't implemented the changes suggested by @dbtsai and @srowen
yet. It think the solution I proposed to this issue may not be very suitable.
I'll make some
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-173395226
**[Test build #49816 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49816/consoleFull)**
for PR 10702 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-173395421
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user dbtsai commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-173395155
@mengxr This PR is also on my radar. Working on another PR now, once
@iyounus is ready, I will work on this.
---
If your project is set up for it, you can reply to
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-173395419
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-173382975
**[Test build #49816 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49816/consoleFull)**
for PR 10702 at commit
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r49929605
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -219,33 +219,41 @@ class LinearRegression @Since("1.3.0")
Github user iyounus commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r49941742
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -219,33 +219,41 @@ class LinearRegression @Since("1.3.0")
Github user dbtsai commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r49399116
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -219,33 +219,41 @@ class LinearRegression @Since("1.3.0")
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-170683940
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-170683942
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-170683724
**[Test build #49163 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49163/consoleFull)**
for PR 10702 at commit
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/10702#discussion_r49374502
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -219,33 +219,41 @@ class LinearRegression @Since("1.3.0")
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/10702#issuecomment-170671978
**[Test build #49163 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49163/consoleFull)**
for PR 10702 at commit
GitHub user iyounus opened a pull request:
https://github.com/apache/spark/pull/10702
[Spark-12732][ML] bug fix in linear regression train
Fixed the bug in linear regression train for the case when the target
variable is constant. The two cases for `fitIntercept=true` or
76 matches
Mail list logo