[GitHub] spark issue #13762: [SPARK-14926] [ML] OneVsRest labelMetadata uses incorrec...

2016-09-12 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/13762 I think we should close this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15052: [SPARK-17500][PySpark]Make DiskBytesSpilled metric in Py...

2016-09-12 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15052 I get that, but if it's always true, then there was no problem to begin with. That's what the code seems to think right now. I haven't looked at the code much but that's the question -- are you sure

[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14638 **[Test build #65251 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65251/consoleFull)** for PR 14638 at commit

[GitHub] spark issue #14623: [SPARK-17044][SQL] Make test files for window functions ...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14623 **[Test build #65252 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65252/consoleFull)** for PR 14623 at commit

[GitHub] spark issue #14527: [SPARK-16938][SQL] `drop/dropDuplicate` should handle th...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14527 **[Test build #65248 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65248/consoleFull)** for PR 14527 at commit

[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-12 Thread a-roberts
Github user a-roberts commented on the issue: https://github.com/apache/spark/pull/14961 Sean, yep, I've had trouble reproducing it too, kicked off a bunch of builds over the weekend including one using Hadoop-2.3 which was my initial theory (only difference between our testing

[GitHub] spark issue #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHEMA

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14116 **[Test build #65250 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65250/consoleFull)** for PR 14116 at commit

[GitHub] spark issue #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14426 **[Test build #65249 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65249/consoleFull)** for PR 14426 at commit

[GitHub] spark issue #15046: [SPARK-17492] [SQL] Fix Reading Cataloged Data Sources w...

2016-09-12 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15046 ah good catch! But adding a new flag looks a little tricky, let me think if there is better way to fix it --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #15052: [SPARK-17500][PySpark]Make DiskBytesSpilled metric in Py...

2016-09-12 Thread djvulee
Github user djvulee commented on the issue: https://github.com/apache/spark/pull/15052 @srowen No. It does not matter whether the file is empty or not, if the file is empty, the `getsize()` just return 0, and this should be OK. --- If your project is set up for it, you can reply to

[GitHub] spark issue #15023: Backport [SPARK-5847] Allow for configuring MetricsSyste...

2016-09-12 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/15023 I'm aware that features are not generally back-ported. The point is, for us this is a bug, preventing a deployment in production. We thus back-ported the fix internally and now propose to

[GitHub] spark issue #15056: [SPARK-17503][Core] Fix memory leak in Memory store when...

2016-09-12 Thread clockfly
Github user clockfly commented on the issue: https://github.com/apache/spark/pull/15056 @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #15040: [WIP] [SPARK-17487] [SQL] Configurable bucketing info ex...

2016-09-12 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15040 `BucketingInfoExtractor` maybe a too flexible concept, we only need a boolean flag to indicate it's a spark native bucketing or hive bucketing, and I'm sure how soon we need to support bucketed

[GitHub] spark issue #14995: [Test Only][SPARK-6235][CORE]Address various 2G limits

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14995 **[Test build #65247 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65247/consoleFull)** for PR 14995 at commit

[GitHub] spark issue #15052: [SPARK-17500][PySpark]Make DiskBytesSpilled metric in Py...

2016-09-12 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15052 Is the idea that the file may be non empty when written ? There is at least one more instance of this call but maybe the file is known to be empty before. --- If your project is set up for it,

[GitHub] spark pull request #15047: [SPARK-17495] [SQL] Add Hash capability semantica...

2016-09-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15047#discussion_r78331887 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveHash.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15048: [SPARK-17409] [SQL] Do Not Optimize Query in CTAS...

2016-09-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15048#discussion_r78331097 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala --- @@ -68,7 +68,7 @@ class ResolveDataSource(sparkSession:

[GitHub] spark issue #15052: [SPARK-17500][PySpark]Make DiskBytesSpilled metric in Py...

2016-09-12 Thread djvulee
Github user djvulee commented on the issue: https://github.com/apache/spark/pull/15052 @srowen I update PR using an increment way to update the DiskBytesSpilled metrics. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request #14988: [SPARK-17425][SQL] Override sameResult in HiveTab...

2016-09-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14988#discussion_r78330099 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala --- @@ -164,4 +164,28 @@ case class HiveTableScanExec(

[GitHub] spark issue #15052: [SPARK-17500][PySpark]Make DiskBytesSpilled metric in Py...

2016-09-12 Thread djvulee
Github user djvulee commented on the issue: https://github.com/apache/spark/pull/15052 @srowen you are right, I will correct it soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13513 **[Test build #65246 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65246/consoleFull)** for PR 13513 at commit

[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13513 **[Test build #65245 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65245/consoleFull)** for PR 13513 at commit

[GitHub] spark issue #15055: [SPARK-17462][MLLIB]use VersionUtils to parse Spark vers...

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15055 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15055: [SPARK-17462][MLLIB]use VersionUtils to parse Spark vers...

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15055 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65241/ Test PASSed. ---

[GitHub] spark issue #15055: [SPARK-17462][MLLIB]use VersionUtils to parse Spark vers...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15055 **[Test build #65241 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65241/consoleFull)** for PR 15055 at commit

[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...

2016-09-12 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/13513 @zsxwing , thanks a lot for your comments, I did several refactorings: 1. Abstract and consolidate `FileStreamSinkLog` and `FileStreamSourceLog`, now they share same code path to do

[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13513 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65244/ Test FAILed. ---

[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13513 **[Test build #65244 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65244/consoleFull)** for PR 13513 at commit

[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13513 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15053 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15053 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65242/ Test FAILed. ---

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15053 **[Test build #65242 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65242/consoleFull)** for PR 15053 at commit

[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13513 **[Test build #65244 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65244/consoleFull)** for PR 13513 at commit

[GitHub] spark issue #15056: [SPARK-17503][Core] Fix memory leak in Memory store when...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15056 **[Test build #65243 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65243/consoleFull)** for PR 15056 at commit

[GitHub] spark issue #12819: [SPARK-14077][ML] Refactor NaiveBayes to support weighte...

2016-09-12 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/12819 @zhengruifeng I saw your implementation switch the training process from RDD operation to Dataset operation with UDAF. I think we should do some performance test to verify there is no

[GitHub] spark pull request #15056: [SPARK-17503][Core] Fix memory leak in Memory sto...

2016-09-12 Thread clockfly
GitHub user clockfly opened a pull request: https://github.com/apache/spark/pull/15056 [SPARK-17503][Core] Fix memory leak in Memory store when unable to cache the whole RDD ## What changes were proposed in this pull request? Memory store may throws OutOfMemoryError

[GitHub] spark issue #15054: [SPARK-17502] [SQL] Fix Multiple Bugs in DDL Statements ...

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15054 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65240/ Test FAILed. ---

[GitHub] spark issue #15054: [SPARK-17502] [SQL] Fix Multiple Bugs in DDL Statements ...

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15054 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15054: [SPARK-17502] [SQL] Fix Multiple Bugs in DDL Statements ...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15054 **[Test build #65240 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65240/consoleFull)** for PR 15054 at commit

[GitHub] spark pull request #12819: [SPARK-14077][ML] Refactor NaiveBayes to support ...

2016-09-12 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/12819#discussion_r78325688 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala --- @@ -109,10 +120,51 @@ class NaiveBayes @Since("1.5.0") (

[GitHub] spark pull request #12819: [SPARK-14077][ML] Refactor NaiveBayes to support ...

2016-09-12 Thread yanboliang
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/12819#discussion_r78325579 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala --- @@ -98,7 +99,17 @@ class NaiveBayes @Since("1.5.0") ( */

[GitHub] spark issue #15000: [SPARK-17437] Add uiWebUrl to JavaSparkContext and pyspa...

2016-09-12 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15000 My only hesitation about this is that this property really only exists to print it in the shell. Is there a good use case for it otherwise? I know it's minor but want to make sure we're not just

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15053 **[Test build #65242 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65242/consoleFull)** for PR 15053 at commit

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-12 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15053 Jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #15052: [SPARK-17500][PySpark]Make DiskBytesSpilled metric in Py...

2016-09-12 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15052 Given how DiskBytesSpilled is used, and still used in other parts of the code, this doesn't look correct. It seems to be a global that is always incremented. Here you reset the value in certain

[GitHub] spark issue #15011: [SPARK-17122][SQL]support drop current database

2016-09-12 Thread adrian-wang
Github user adrian-wang commented on the issue: https://github.com/apache/spark/pull/15011 @hvanhovell I have checked with Hive and MySQL, they all support dropping current database. By asking user to switch to another database before drop the current one is not enough though, if

[GitHub] spark issue #15055: [SPARK-17462][MLLIB]use VersionUtils to parse Spark vers...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15055 **[Test build #65241 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65241/consoleFull)** for PR 15055 at commit

[GitHub] spark pull request #15055: [SPARK-17462][MLLIB]use VersionUtils to parse Spa...

2016-09-12 Thread VinceShieh
GitHub user VinceShieh opened a pull request: https://github.com/apache/spark/pull/15055 [SPARK-17462][MLLIB]use VersionUtils to parse Spark version strings ## What changes were proposed in this pull request? Several places in MLlib use custom regexes or other approaches to

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-12 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78321887 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -460,33 +577,74 @@ class LogisticRegression

[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13758 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65239/ Test FAILed. ---

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-12 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78321247 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -323,32 +382,33 @@ class LogisticRegression

[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13758 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13758 **[Test build #65239 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65239/consoleFull)** for PR 13758 at commit

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-12 Thread dbtsai
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78321146 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -311,8 +350,28 @@ class LogisticRegression @Since("1.2.0")

[GitHub] spark issue #11729: [SPARK-13073] [MLib] [WIP] creating R like summary for l...

2016-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/11729 gentle ping @mbaddar1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #11079: [SPARK-13197][SQL] When trying to select from the data f...

2016-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/11079 +1 for not a problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #15054: [SPARK-17502] [SQL] Fix Multiple Bugs in DDL Statements ...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15054 **[Test build #65240 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65240/consoleFull)** for PR 15054 at commit

[GitHub] spark pull request #15054: [SPARK-17502] [SQL] Fix Multiple Bugs in DDL Stat...

2016-09-12 Thread gatorsmile
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/15054 [SPARK-17502] [SQL] Fix Multiple Bugs in DDL Statements on Temporary Views [WIP] ### What changes were proposed in this pull request? - When the permanent tables/views do not exist but the

[GitHub] spark issue #15020: Spark 2.0 error in Intellij

2016-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15020 ping @bigdatatraining --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

<    1   2   3   4   5   6