[GitHub] spark issue #13326: [SPARK-15560] [Mesos] Queued/Supervise drivers waiting f...

2016-06-04 Thread devaraj-kavali
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/13326 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59989/ Test FAILed. ---

[GitHub] spark issue #13326: [SPARK-15560] [Mesos] Queued/Supervise drivers waiting f...

2016-06-04 Thread devaraj-kavali
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/13326 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13326: [SPARK-15560] [Mesos] Queued/Supervise drivers waiting f...

2016-06-04 Thread devaraj-kavali
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/13326 **[Test build #59989 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59989/consoleFull)** for PR 13326 at commit [`7f4f34b`](https://github.com/apache/spark/

[GitHub] spark issue #13483: [SPARK-15688][SQL] RelationalGroupedDataset.toDF should ...

2016-06-04 Thread dilipbiswal
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/13483 @viirya You know, as I said above, both ways are not perfect. My inputs are just based on my previous design experiences. All the design decisions I made are based on usage scenarios.

[GitHub] spark issue #13483: [SPARK-15688][SQL] RelationalGroupedDataset.toDF should ...

2016-06-04 Thread dilipbiswal
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/13483 @gatorsmile I am not stand for it due to actual use cases, but the API behavior consistency. If we disallow duplicate group by columns, then we should do filtering them all. Let df.

[GitHub] spark issue #13483: [SPARK-15688][SQL] RelationalGroupedDataset.toDF should ...

2016-06-04 Thread dilipbiswal
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/13483 @viirya I am just wondering why users need a dataframe with duplicate column names? Could you give me a usage scenario? --- If your project is set up for it, you can reply to this email and hav

[GitHub] spark issue #13510: [MINOR] [BUILD] Add modernizr MIT license; specify "2014...

2016-06-04 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/13510 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #13483: [SPARK-15688][SQL] RelationalGroupedDataset.toDF should ...

2016-06-04 Thread dilipbiswal
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/13483 @gatorsmile Your last example just shows the inconsistency. Given two different parameters, `$"col1", count("*")` and `count("*")`, you get the same output. I think this confuses users. In contr

[GitHub] spark issue #13510: [MINOR] [BUILD] Add modernizr MIT license; specify "2014...

2016-06-04 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/13510 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59988/ Test PASSed. --- If your

[GitHub] spark issue #13510: [MINOR] [BUILD] Add modernizr MIT license; specify "2014...

2016-06-04 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/13510 **[Test build #59988 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59988/consoleFull)** for PR 13510 at commit [`d71067e`](https://github.com/apache/spark/commit/d

[GitHub] spark issue #13483: [SPARK-15688][SQL] RelationalGroupedDataset.toDF should ...

2016-06-04 Thread dilipbiswal
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/13483 @viirya This is a design decision. So far, both ways are not perfect. In my mind, we have to consider the use cases here. If users want to have duplicate columns, they should not use th

[GitHub] spark issue #13326: [SPARK-15560] [Mesos] Queued/Supervise drivers waiting f...

2016-06-04 Thread devaraj-kavali
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/13326 **[Test build #59989 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59989/consoleFull)** for PR 13326 at commit [`7f4f34b`](https://github.com/apache/spark/c

[GitHub] spark issue #13326: [SPARK-15560] [Mesos] Queued/Supervise drivers waiting f...

2016-06-04 Thread devaraj-kavali
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/13326 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #13509: [SPARK-15740] [MLLIB] Word2VecSuite "big model load / sa...

2016-06-04 Thread tmnd1991
Github user tmnd1991 commented on the issue: https://github.com/apache/spark/pull/13509 I noticed a scala style error, wait till new commit before triggering a jenkins build. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as wel

[GitHub] spark pull request #13390: [SPARK-15617][ML][DOC] Clarify that fMeasure in M...

2016-06-04 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13390 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is ena

[GitHub] spark issue #13510: [MINOR] [BUILD] Add modernizr MIT license; specify "2014...

2016-06-04 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/13510 **[Test build #59988 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59988/consoleFull)** for PR 13510 at commit [`d71067e`](https://github.com/apache/spark/commit/d7

[GitHub] spark issue #13390: [SPARK-15617][ML][DOC] Clarify that fMeasure in Multicla...

2016-06-04 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/13390 Merged to master/2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishe

[GitHub] spark pull request #13510: [MINOR] [BUILD] Add modernizr MIT license; specif...

2016-06-04 Thread srowen
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/13510 [MINOR] [BUILD] Add modernizr MIT license; specify "2014 and onwards" in license copyright ## What changes were proposed in this pull request? Per conversation on dev list, add missing mode

[GitHub] spark issue #13509: SPARK-15740

2016-06-04 Thread tmnd1991
Github user tmnd1991 commented on the issue: https://github.com/apache/spark/pull/13509 (Fix the title please) https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark --- If your project is set up for it, you can reply to this email and have your reply appear on GitHu

[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...

2016-06-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13486 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59987/ Test PASSed. ---

[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...

2016-06-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13486 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...

2016-06-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13486 **[Test build #59987 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59987/consoleFull)** for PR 13486 at commit [`6a9006d`](https://github.com/apache/spark/c

[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...

2016-06-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13486 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59986/ Test PASSed. ---

[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...

2016-06-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13486 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...

2016-06-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13486 **[Test build #59986 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59986/consoleFull)** for PR 13486 at commit [`9c5f13d`](https://github.com/apache/spark/c

[GitHub] spark issue #13483: [SPARK-15688][SQL] RelationalGroupedDataset.toDF should ...

2016-06-04 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13483 @gatorsmile this automatically deduplicate the group by columns will cause confusion. Using your example: df.groupBy("col1").agg(count("*")) When users try the above API call, th

[GitHub] spark issue #13496: [SPARK-15753][SQL] Move Analyzer stuff to Analyzer from ...

2016-06-04 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13496 also cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark issue #13494: [SPARK-15752] [SQL] support optimization for metadata on...

2016-06-04 Thread lianhuiwang
Github user lianhuiwang commented on the issue: https://github.com/apache/spark/pull/13494 @rxin I have writed a design doc: https://docs.google.com/document/d/1Bmi4-PkTaBQ0HVaGjIqa3eA12toKX52QaiUyhb6WQiM/edit?usp=sharing. Glad to get your comments. Thanks. --- If your project is se

[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...

2016-06-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13486 **[Test build #59987 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59987/consoleFull)** for PR 13486 at commit [`6a9006d`](https://github.com/apache/spark/commit/6

[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...

2016-06-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13486 **[Test build #59986 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59986/consoleFull)** for PR 13486 at commit [`9c5f13d`](https://github.com/apache/spark/commit/9

[GitHub] spark issue #13509: SPARK-15740

2016-06-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13509 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feat

[GitHub] spark pull request #13509: SPARK-15740

2016-06-04 Thread tmnd1991
GitHub user tmnd1991 opened a pull request: https://github.com/apache/spark/pull/13509 SPARK-15740 ## What changes were proposed in this pull request? "test big model load / save" in Word2VecSuite, lately resulted into OOM. Therefore we decided to make the partitioning adapti

[GitHub] spark pull request #13486: [SPARK-15743][SQL] Prevent saving with all-column...

2016-06-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13486#discussion_r65799702 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala --- @@ -350,6 +350,10 @@ private[sql] object Part

[GitHub] spark pull request #13486: [SPARK-15743][SQL] Prevent saving with all-column...

2016-06-04 Thread wangyang1992
Github user wangyang1992 commented on a diff in the pull request: https://github.com/apache/spark/pull/13486#discussion_r65799660 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala --- @@ -350,6 +350,10 @@ private[sql] object Parti

[GitHub] spark pull request #13486: [SPARK-15743][SQL] Prevent saving with all-column...

2016-06-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13486#discussion_r65799585 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala --- @@ -350,6 +350,10 @@ private[sql] object Part

[GitHub] spark pull request #13486: [SPARK-15743][SQL] Prevent saving with all-column...

2016-06-04 Thread wangyang1992
Github user wangyang1992 commented on a diff in the pull request: https://github.com/apache/spark/pull/13486#discussion_r65799422 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala --- @@ -350,6 +350,10 @@ private[sql] object Parti

[GitHub] spark issue #12258: [SPARK-14485][CORE] ignore task finished for executor lo...

2016-06-04 Thread zhonghaihua
Github user zhonghaihua commented on the issue: https://github.com/apache/spark/pull/12258 Hi, @vanzin , the code has changed, could you review it, please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project d

[GitHub] spark issue #12258: [SPARK-14485][CORE] ignore task finished for executor lo...

2016-06-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12258 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59985/ Test PASSed. ---

[GitHub] spark issue #12258: [SPARK-14485][CORE] ignore task finished for executor lo...

2016-06-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12258 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #12258: [SPARK-14485][CORE] ignore task finished for executor lo...

2016-06-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12258 **[Test build #59985 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59985/consoleFull)** for PR 12258 at commit [`cb93988`](https://github.com/apache/spark/commit/

[GitHub] spark pull request #13506: [SPARK-15763][SQL] Support DELETE FILE command na...

2016-06-04 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/13506#discussion_r65799162 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1441,6 +1441,32 @@ class SparkContext(config: SparkConf) extends Logging with Exe

[GitHub] spark issue #13248: [SPARK-15194] [ML] Add Python ML API for MultivariateGau...

2016-06-04 Thread vectorijk
Github user vectorijk commented on the issue: https://github.com/apache/spark/pull/13248 @praveendareddy21 For generating documentation for this API correctly, you could include this in `spark/python/docs/pyspark.ml.rst` ``` pyspark.ml.stat module --

[GitHub] spark pull request #13248: [SPARK-15194] [ML] Add Python ML API for Multivar...

2016-06-04 Thread vectorijk
Github user vectorijk commented on a diff in the pull request: https://github.com/apache/spark/pull/13248#discussion_r65798896 --- Diff: python/pyspark/ml/stat/distribution.py --- @@ -0,0 +1,267 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request #13248: [SPARK-15194] [ML] Add Python ML API for Multivar...

2016-06-04 Thread vectorijk
Github user vectorijk commented on a diff in the pull request: https://github.com/apache/spark/pull/13248#discussion_r65798890 --- Diff: python/pyspark/ml/stat/distribution.py --- @@ -0,0 +1,267 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark issue #13502: [SPARK-15760][docs] Add documentation for package-relate...

2016-06-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/13502 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

<    1   2