[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-221026877 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-221026865 **[Test build #59144 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59144/consoleFull)** for PR 11610 at commit

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-221026870 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-221025962 **[Test build #59144 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59144/consoleFull)** for PR 11610 at commit

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-221023283 **[Test build #59136 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59136/consoleFull)** for PR 11610 at commit

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-221023289 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-221023290 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-221022478 **[Test build #59136 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59136/consoleFull)** for PR 11610 at commit

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-05-10 Thread iyounus
Github user iyounus commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-218239032 @mengxr I looked into using DGELSD to solve `A^T A x = A^T b` as you suggested. It works fine, but then the issue is how to calculate the errors on the coefficients

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-05-06 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-217511021 Ping @iyounus ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-19 Thread iyounus
Github user iyounus commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-197468720 One problem with the eigen decomposition method is that for rank deficient matrix some of the eigenvalues can be extremely small (instead of being zero) and their

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-19 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-197557069 @dbtsai There is a good chance of precision loss during the computation of A^T A is A is ill-conditioned. A better approach is to factorize A directly. It is similar to

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-16 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-197213572 I'm not an expert in this area, but after thinking it more, I don't think we can use `DGELSD` which minimizes `||b - A*x||` using the singular value decomposition (SVD)

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-16 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-197199401 Locally, we are solving `A^T A x = A^T b`. In a rank deficient case, we can compute the min-length least squares solution that also minimizes `\| x \|_2`, which is

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-15 Thread iyounus
Github user iyounus commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-196960794 I'm a bit confused about the use of DGELSD. As far as I can tell, it requires matrix A itself. But in the current implementation, we're decomposing A^T.A on the

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-14 Thread dbtsai
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-196662210 I will vote for approach 1. SVD will be the most stable algorithm, but slowest O(mn^2 + n^3) compared with Cholesky O(mn^2) or QR O(mn^2 - n^3/3) decomposition.

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-196503166 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-196503171 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-14 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-196502654 **[Test build #53091 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53091/consoleFull)** for PR 11610 at commit

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-14 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-196486264 @iyounus @dbtsai The normal equation approach will fail if the matrix A is rank-deficient. It happens when there are constant columns. However, more generally, it

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-14 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-196480702 **[Test build #53091 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53091/consoleFull)** for PR 11610 at commit

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/11610#discussion_r55964826 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -108,6 +101,57 @@ private[ml] class WeightedLeastSquares(

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/11610#discussion_r55964808 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -108,6 +101,57 @@ private[ml] class WeightedLeastSquares(

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/11610#discussion_r55964768 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -108,6 +101,57 @@ private[ml] class WeightedLeastSquares(

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/11610#discussion_r55964546 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -108,6 +101,57 @@ private[ml] class WeightedLeastSquares(

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/11610#discussion_r55964451 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -108,6 +101,57 @@ private[ml] class WeightedLeastSquares(

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/11610#discussion_r55964211 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -108,6 +101,57 @@ private[ml] class WeightedLeastSquares(

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/11610#discussion_r55963818 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -108,6 +101,57 @@ private[ml] class WeightedLeastSquares(

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/11610#discussion_r55963496 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -108,6 +101,57 @@ private[ml] class WeightedLeastSquares(

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-195644840 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-195644841 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-195644779 **[Test build #52981 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52981/consoleFull)** for PR 11610 at commit

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-195631468 **[Test build #52981 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52981/consoleFull)** for PR 11610 at commit

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-11 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/11610#discussion_r55909205 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -108,6 +101,53 @@ private[ml] class WeightedLeastSquares(

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-11 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/11610#discussion_r55909088 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -108,6 +101,53 @@ private[ml] class WeightedLeastSquares(

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-11 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/11610#discussion_r55908792 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -120,34 +160,47 @@ private[ml] class WeightedLeastSquares(

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-11 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/11610#discussion_r55908431 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -108,6 +101,53 @@ private[ml] class WeightedLeastSquares(

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-11 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/11610#discussion_r55908419 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -108,6 +101,53 @@ private[ml] class WeightedLeastSquares(

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-11 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/11610#discussion_r55908346 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -108,6 +101,53 @@ private[ml] class WeightedLeastSquares(

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-11 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/11610#discussion_r55908232 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala --- @@ -108,6 +101,53 @@ private[ml] class WeightedLeastSquares(

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-09 Thread iyounus
Github user iyounus commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-194522061 I should point out that to identify constant features, I'm comparing variance (aVar) to zero. But, It can happen that the variance for constant features may not be

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-194506646 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-194506644 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-194506333 **[Test build #52764 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52764/consoleFull)** for PR 11610 at commit

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-09 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-194489209 **[Test build #52764 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52764/consoleFull)** for PR 11610 at commit

[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...

2016-03-09 Thread iyounus
GitHub user iyounus opened a pull request: https://github.com/apache/spark/pull/11610 [SPARK-13777] [ML] Remove constant features from training in noraml solver (WLS) ## What changes were proposed in this pull request? "normal" solver in LinearRegression uses Cholesky