[GitHub] spark pull request #14547: [SPARK-16718][MLlib] gbm-style treeboost

2016-09-12 Thread vlad17
Github user vlad17 commented on a diff in the pull request: https://github.com/apache/spark/pull/14547#discussion_r78482846 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -148,11 +154,14 @@ class GBTClassifier @Since("1.4.0") (

[GitHub] spark issue #14995: [Test Only][SPARK-6235][CORE]Address various 2G limits

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14995 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65291/ Test FAILed. ---

[GitHub] spark issue #14995: [Test Only][SPARK-6235][CORE]Address various 2G limits

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14995 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14995: [Test Only][SPARK-6235][CORE]Address various 2G limits

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14995 **[Test build #65291 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65291/consoleFull)** for PR 14995 at commit

[GitHub] spark issue #14995: [Test Only][SPARK-6235][CORE]Address various 2G limits

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14995 **[Test build #65291 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65291/consoleFull)** for PR 14995 at commit

[GitHub] spark issue #15068: [SPARK-17514] df.take(1) and df.limit(1).collect() shoul...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15068 **[Test build #65290 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65290/consoleFull)** for PR 15068 at commit

[GitHub] spark pull request #14547: [SPARK-16718][MLlib] gbm-style treeboost

2016-09-12 Thread vlad17
Github user vlad17 commented on a diff in the pull request: https://github.com/apache/spark/pull/14547#discussion_r78481662 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -42,18 +42,30 @@ import org.apache.spark.sql.types.DoubleType

[GitHub] spark pull request #14547: [SPARK-16718][MLlib] gbm-style treeboost

2016-09-12 Thread vlad17
Github user vlad17 commented on a diff in the pull request: https://github.com/apache/spark/pull/14547#discussion_r78481687 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala --- @@ -42,18 +42,30 @@ import org.apache.spark.sql.types.DoubleType

[GitHub] spark issue #15068: [SPARK-17514] df.take(1) and df.limit(1).collect() shoul...

2016-09-12 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/15068 /cc @davies @rxin for review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #15068: [SPARK-17514] df.take(1) and df.limit(1).collect(...

2016-09-12 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/15068#discussion_r78481396 --- Diff: python/pyspark/sql/dataframe.py --- @@ -357,10 +357,7 @@ def take(self, num): >>> df.take(2) [Row(age=2,

[GitHub] spark issue #15068: [SPARK-17514] df.take(1) and df.limit(1).collect() shoul...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15068 **[Test build #65288 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65288/consoleFull)** for PR 15068 at commit

[GitHub] spark issue #12819: [SPARK-14077][ML] Refactor NaiveBayes to support weighte...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12819 **[Test build #65289 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65289/consoleFull)** for PR 12819 at commit

[GitHub] spark pull request #15068: [SPARK-17514] df.take(1) and df.limit(1).collect(...

2016-09-12 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/15068#discussion_r78481314 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2567,8 +2567,12 @@ class Dataset[T] private[sql]( }

[GitHub] spark pull request #15068: [SPARK-17514] df.take(1) and df.limit(1).collect(...

2016-09-12 Thread JoshRosen
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/15068 [SPARK-17514] df.take(1) and df.limit(1).collect() should perform the same in Python ## What changes were proposed in this pull request? In PySpark, `df.take(1)` ends up running a

[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...

2016-09-12 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/13513 @zsxwing @frreiss thanks a lot for your comments. I think the semantics of `FileStreamSource.getBatch(start: Option[Offset], end: Offset)` still keeps the same, since I overrided the

[GitHub] spark pull request #15053: [Doc] improve python API docstrings

2016-09-12 Thread mortada
Github user mortada commented on a diff in the pull request: https://github.com/apache/spark/pull/15053#discussion_r78480019 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1393,6 +1420,7 @@ def withColumnRenamed(self, existing, new): :param existing: string, name of

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15053 @srowen Would this sensible if we remove all the predefined dataframes as globals and convert them to within each doctest if this change looks good? --- If your project is set up for it, you

[GitHub] spark pull request #15053: [Doc] improve python API docstrings

2016-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15053#discussion_r78479721 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1393,6 +1420,7 @@ def withColumnRenamed(self, existing, new): :param existing: string,

[GitHub] spark pull request #15053: [Doc] improve python API docstrings

2016-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15053#discussion_r78479675 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1378,6 +1404,7 @@ def withColumn(self, colName, col): :param colName: string, name of the

[GitHub] spark pull request #15053: [Doc] improve python API docstrings

2016-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15053#discussion_r78479591 --- Diff: python/pyspark/sql/dataframe.py --- @@ -106,6 +106,7 @@ def toJSON(self, use_unicode=True): Each row is turned into a JSON

[GitHub] spark pull request #15053: [Doc] improve python API docstrings

2016-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15053#discussion_r78479631 --- Diff: python/pyspark/sql/dataframe.py --- @@ -329,6 +339,7 @@ def toLocalIterator(self): Returns an iterator that contains all of the

[GitHub] spark pull request #15053: [Doc] improve python API docstrings

2016-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15053#discussion_r78479542 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1444,6 +1474,7 @@ def toDF(self, *cols): :param cols: list of new column names

[GitHub] spark pull request #15053: [Doc] improve python API docstrings

2016-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15053#discussion_r78479505 --- Diff: python/pyspark/sql/dataframe.py --- @@ -354,6 +366,7 @@ def limit(self, num): def take(self, num): """Returns the first

[GitHub] spark issue #14834: [SPARK-17163][ML] Unified LogisticRegression interface

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14834 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14834: [SPARK-17163][ML] Unified LogisticRegression interface

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14834 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65284/ Test PASSed. ---

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15053 BTW - If you meant how they passed before not how they ran, it seems some dataframes were created before actually running the tests,

[GitHub] spark issue #14834: [SPARK-17163][ML] Unified LogisticRegression interface

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14834 **[Test build #65284 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65284/consoleFull)** for PR 14834 at commit

[GitHub] spark issue #15048: [SPARK-17409] [SQL] Do Not Optimize Query in CTAS More T...

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15048 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65283/ Test PASSed. ---

[GitHub] spark issue #15048: [SPARK-17409] [SQL] Do Not Optimize Query in CTAS More T...

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15048 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15063: [SPARK-17463][Core]Make CollectionAccumulator and SetAcc...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15063 **[Test build #65287 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65287/consoleFull)** for PR 15063 at commit

[GitHub] spark issue #15065: [SPARK-17463][Core]Add necessary memory barrier for accu...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15065 **[Test build #65286 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65286/consoleFull)** for PR 15065 at commit

[GitHub] spark issue #15048: [SPARK-17409] [SQL] Do Not Optimize Query in CTAS More T...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15048 **[Test build #65283 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65283/consoleFull)** for PR 15048 at commit

[GitHub] spark issue #15067: [SPARK-17513] [STREAMING] [SQL] Make StreamExecution gar...

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15067 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #13642: [MINOR] Clean up several build warnings, mostly d...

2016-09-12 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/13642#discussion_r78477976 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -17,48 +17,18 @@ package

[GitHub] spark pull request #15067: [SPARK-17513] [STREAMING] [SQL] Make StreamExecut...

2016-09-12 Thread frreiss
GitHub user frreiss opened a pull request: https://github.com/apache/spark/pull/15067 [SPARK-17513] [STREAMING] [SQL] Make StreamExecution garbage-collect its metadata ## What changes were proposed in this pull request? This PR modifies StreamExecution such that it

[GitHub] spark pull request #13642: [MINOR] Clean up several build warnings, mostly d...

2016-09-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13642#discussion_r78477612 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -17,48 +17,18 @@ package

[GitHub] spark issue #15065: [SPARK-17463][Core]Add necessary memory barrier for accu...

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15065 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15065: [SPARK-17463][Core]Add necessary memory barrier for accu...

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15065 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65282/ Test FAILed. ---

[GitHub] spark issue #15065: [SPARK-17463][Core]Add necessary memory barrier for accu...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15065 **[Test build #65282 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65282/consoleFull)** for PR 15065 at commit

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15024#discussion_r78475748 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala --- @@ -253,6 +266,7 @@ class InMemoryCatalog(

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15024#discussion_r78475247 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -410,15 +417,22 @@ private[spark] class

[GitHub] spark issue #15066: [SPARK-171114][SQL] Fix literals in GROUP BY 1 columns [...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15066 **[Test build #65285 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65285/consoleFull)** for PR 15066 at commit

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15024#discussion_r78474896 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -410,15 +417,22 @@ private[spark] class

[GitHub] spark pull request #14750: [SPARK-17183][SQL] put hive serde table schema to...

2016-09-12 Thread clockfly
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/14750#discussion_r78474805 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala --- @@ -245,4 +245,28 @@ object DataType { case (fromDataType,

[GitHub] spark pull request #15066: [SPARK-171114][SQL] Fix literals in GROUP BY 1 co...

2016-09-12 Thread hvanhovell
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/15066 [SPARK-171114][SQL] Fix literals in GROUP BY 1 columns [WIP] ## What changes were proposed in this pull request? TODO ## How was this patch tested? I'll add a test tomorrow.

[GitHub] spark pull request #15030: [SPARK-17474] [SQL] fix python udf in TakeOrdered...

2016-09-12 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15030 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread clockfly
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/15024#discussion_r78474538 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala --- @@ -253,6 +266,7 @@ class InMemoryCatalog(

[GitHub] spark issue #15030: [SPARK-17474] [SQL] fix python udf in TakeOrderedAndProj...

2016-09-12 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/15030 Merging into 2.0 and master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #15063: [SPARK-17463][Core]Make CollectionAccumulator and SetAcc...

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15063 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65281/ Test FAILed. ---

[GitHub] spark issue #15063: [SPARK-17463][Core]Make CollectionAccumulator and SetAcc...

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15063 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15063: [SPARK-17463][Core]Make CollectionAccumulator and SetAcc...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15063 **[Test build #65281 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65281/consoleFull)** for PR 15063 at commit

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread clockfly
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/15024#discussion_r78473168 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -410,15 +417,22 @@ private[spark] class

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15053 Oh, it will run doc tests as far as I know, http://www.sphinx-doc.org/en/stable/ext/doctest.html Maybe I will try to run it locally to check it by myself. --- If your project is set

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread clockfly
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/15024#discussion_r78472441 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala --- @@ -262,11 +262,13 @@ class CatalogImpl(sparkSession: SparkSession)

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15024#discussion_r78472391 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -665,15 +665,7 @@ case class AlterTableSetLocationCommand(

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread clockfly
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/15024#discussion_r78472272 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -435,13 +435,13 @@ case class DataSource(

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread clockfly
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/15024#discussion_r78472224 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -435,13 +435,13 @@ case class DataSource(

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread clockfly
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/15024#discussion_r78471429 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -665,15 +665,7 @@ case class AlterTableSetLocationCommand(

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15024#discussion_r78471404 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala --- @@ -253,6 +266,7 @@ class InMemoryCatalog(

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread clockfly
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/15024#discussion_r78471303 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala --- @@ -204,13 +194,21 @@ case class

[GitHub] spark issue #15030: [SPARK-17474] [SQL] fix python udf in TakeOrderedAndProj...

2016-09-12 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/15030 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread clockfly
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/15024#discussion_r78471240 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala --- @@ -204,13 +194,21 @@ case class

[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...

2016-09-12 Thread frreiss
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/13513 You could just move the metadata deletion logic from FileStreamSinkLog into CompactibleFileStreamLog. Then FileStreamSource could issue DELETE log records for files that are older than

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread clockfly
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/15024#discussion_r78471137 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala --- @@ -154,13 +149,8 @@ case class

[GitHub] spark pull request #15027: [SPARK-17475] [STREAMING] Delete CRC files if the...

2016-09-12 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/15027#discussion_r78469982 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala --- @@ -146,6 +146,11 @@ class HDFSMetadataLog[T:

[GitHub] spark pull request #15027: [SPARK-17475] [STREAMING] Delete CRC files if the...

2016-09-12 Thread jodersky
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15027#discussion_r78469460 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala --- @@ -146,6 +146,11 @@ class HDFSMetadataLog[T:

[GitHub] spark pull request #15037: [SPARK-17485] Prevent failed remote reads of cach...

2016-09-12 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15037 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread clockfly
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/15024#discussion_r78469005 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala --- @@ -253,6 +266,7 @@ class InMemoryCatalog(

[GitHub] spark issue #15037: [SPARK-17485] Prevent failed remote reads of cached bloc...

2016-09-12 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/15037 Merging to master and branch-2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread clockfly
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/15024#discussion_r78468230 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala --- @@ -195,18 +195,31 @@ class InMemoryCatalog(

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread clockfly
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/15024#discussion_r78468181 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala --- @@ -195,18 +195,31 @@ class InMemoryCatalog(

[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...

2016-09-12 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/13513 Just noticed that `FileStreamSource.getBatch(start: Option[Offset], end: Offset)` is broken in this PR. `start` could be an arbitrary offset. I think we need to store `batchId` with its

[GitHub] spark pull request #15024: [SPARK-17470][SQL] unify path for data source tab...

2016-09-12 Thread clockfly
Github user clockfly commented on a diff in the pull request: https://github.com/apache/spark/pull/15024#discussion_r78468093 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala --- @@ -195,18 +195,31 @@ class InMemoryCatalog(

[GitHub] spark issue #14750: [SPARK-17183][SQL] put hive serde table schema to table ...

2016-09-12 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14750 LGTM, except one minor comment https://github.com/apache/spark/pull/14750#discussion_r78106864. That comment does not affect the existing code.

[GitHub] spark issue #15035: [SPARK-17477]: SparkSQL cannot handle schema evolution f...

2016-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15035 Hm.. are you sure this is a problem in all data sources? IIUC, JSON and CSV kind of allows permissive upcasting whereas ORC and Parquet do not - so this would be rather ORC and Parquet specific

[GitHub] spark issue #15063: [SPARK-17463][Core]Make CollectionAccumulator and SetAcc...

2016-09-12 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/15063 Jenkins, retest this please. (Retesting so MiMa can run again) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #15061: [SPARK-14818] Post-2.0 MiMa exclusion and build c...

2016-09-12 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15061 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #14842: [SPARK-10747][SQL] Support NULLS FIRST|LAST clause in OR...

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14842 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65279/ Test PASSed. ---

[GitHub] spark issue #14842: [SPARK-10747][SQL] Support NULLS FIRST|LAST clause in OR...

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14842 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14842: [SPARK-10747][SQL] Support NULLS FIRST|LAST clause in OR...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14842 **[Test build #65279 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65279/consoleFull)** for PR 14842 at commit

[GitHub] spark issue #15061: [SPARK-14818] Post-2.0 MiMa exclusion and build changes

2016-09-12 Thread JoshRosen
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/15061 I think that the test timeout is unrelated (since it occurred in PySpark tests and those are unaffected by changes to MiMa excludes), so I'm going to merge this now and will cherry-pick to

[GitHub] spark issue #15061: [SPARK-14818] Post-2.0 MiMa exclusion and build changes

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15061 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15061: [SPARK-14818] Post-2.0 MiMa exclusion and build changes

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15061 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65268/ Test FAILed. ---

[GitHub] spark issue #15061: [SPARK-14818] Post-2.0 MiMa exclusion and build changes

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15061 **[Test build #65268 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65268/consoleFull)** for PR 15061 at commit

[GitHub] spark issue #14834: [SPARK-17163][ML] Unified LogisticRegression interface

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14834 **[Test build #65284 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65284/consoleFull)** for PR 14834 at commit

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-12 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78464458 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -323,32 +382,33 @@ class LogisticRegression

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-12 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78464517 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -460,33 +577,74 @@ class LogisticRegression

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-12 Thread sethah
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/14834#discussion_r78464438 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -311,8 +350,28 @@ class LogisticRegression @Since("1.2.0")

[GitHub] spark pull request #15026: [SPARK-17472] [PYSPARK] Better error message for ...

2016-09-12 Thread Stibbons
Github user Stibbons commented on a diff in the pull request: https://github.com/apache/spark/pull/15026#discussion_r78464418 --- Diff: python/pyspark/broadcast.py --- @@ -75,7 +75,13 @@ def __init__(self, sc=None, value=None, pickle_registry=None, path=None):

[GitHub] spark issue #15048: [SPARK-17409] [SQL] Do Not Optimize Query in CTAS More T...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15048 **[Test build #65283 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65283/consoleFull)** for PR 15048 at commit

[GitHub] spark issue #15065: [SPARK-17463][Core]Add necessary memory barrier for accu...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15065 **[Test build #65282 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65282/consoleFull)** for PR 15065 at commit

[GitHub] spark pull request #15065: [SPARK-17463][Core]Add necessary memory barrier f...

2016-09-12 Thread zsxwing
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/15065 [SPARK-17463][Core]Add necessary memory barrier for accumulators ## What changes were proposed in this pull request? Added `volatile` for fields that will be read in the heartbeat thread.

[GitHub] spark issue #14467: [SPARK-16861][PYSPARK][CORE] Refactor PySpark accumulato...

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14467 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14467: [SPARK-16861][PYSPARK][CORE] Refactor PySpark accumulato...

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14467 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65274/ Test PASSed. ---

[GitHub] spark issue #14467: [SPARK-16861][PYSPARK][CORE] Refactor PySpark accumulato...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14467 **[Test build #65274 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65274/consoleFull)** for PR 14467 at commit

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11105 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65276/ Test PASSed. ---

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11105 **[Test build #65276 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65276/consoleFull)** for PR 11105 at commit

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-09-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11105 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #15048: [SPARK-17409] [SQL] Do Not Optimize Query in CTAS...

2016-09-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15048#discussion_r78461431 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala --- @@ -68,7 +68,7 @@ class ResolveDataSource(sparkSession:

[GitHub] spark issue #15063: [SPARK-17463][Core]Make CollectionAccumulator and SetAcc...

2016-09-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15063 **[Test build #65281 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65281/consoleFull)** for PR 15063 at commit

<    1   2   3   4   5   6   >