[GitHub] spark issue #22562: [SPARK-25541][SQL][FOLLOWUP] Remove overriding filterKey...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22562 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22552: [SPARK-25540][SQL][PYSPARK] Make HiveContext in PySpark ...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22552 This is part of the bug fix PR: #22545 . I asked @ueshin to separate it since this one is self-contained

[GitHub] spark issue #22545: [SPARK-25525][SQL][PYSPARK] Do not update conf for exist...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22545 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22543: [SPARK-23715][SQL][DOC] improve document for from/to_utc...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22543 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #22552: [SPARK-25540][SQL][PYSPARK] Make HiveContext in PySpark ...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22552 thanks, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #22544: [SPARK-25522][SQL] Improve type promotion for inp...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22544#discussion_r220772034 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -2140,21 +2140,34 @@ case class

[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22010 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #22556: [MINOR] Remove useless InSubquery expression

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22556 LGTM, +1 for creating a JIRA --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #22553: [SPARK-25541][SQL] CaseInsensitiveMap should be s...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22553#discussion_r220770689 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala --- @@ -42,7 +42,11 @@ class CaseInsensitiveMap[T

[GitHub] spark issue #22326: [SPARK-25314][SQL] Fix Python UDF accessing attributes f...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22326 LGTM except some unnecessary end-to-end tests. +1 for @mgaido91 's idea about unit test, something like the test suites under `org.apache.spark.sql.catalyst.optimizer`. I'm ok to do

[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attri...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22326#discussion_r220770012 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/python/BatchEvalPythonExecSuite.scala --- @@ -100,6 +105,29 @@ class

[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attri...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22326#discussion_r220769336 --- Diff: python/pyspark/sql/tests.py --- @@ -552,6 +552,96 @@ def test_udf_in_filter_on_top_of_join(self): df = left.crossJoin(right).filter

[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21403 sure, feel free to open a PR first. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r220658815 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -554,18 +554,30 @@ case class JsonToStructs

[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21403 I just realized there are 2 `InSubquery` expressions, seems we need to rename one of it. @mgaido91 any ideas

[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attri...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22326#discussion_r220564636 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +153,51 @@ object EliminateOuterJoin extends

[GitHub] spark issue #22364: [SPARK-25379][SQL] Improve AttributeSet and ColumnPrunin...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22364 LGTM, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22466: [SPARK-25464][SQL]On dropping the Database it wil...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22466#discussion_r220553606 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -66,6 +66,19 @@ case class CreateDatabaseCommand

[GitHub] spark pull request #22544: [SPARK-25522][SQL] Improve type promotion for inp...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22544#discussion_r220533930 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -2140,21 +2140,34 @@ case class

[GitHub] spark pull request #22544: [SPARK-25522][SQL] Improve type promotion for inp...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22544#discussion_r220532183 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -974,6 +974,33 @@ object TypeCoercion

[GitHub] spark pull request #22544: [SPARK-25522][SQL] Improve type promotion for inp...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22544#discussion_r220531899 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -974,6 +974,33 @@ object TypeCoercion

[GitHub] spark issue #22553: [SPARK-25541][SQL] CaseInsensitiveMap should be serializ...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22553 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attri...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22326#discussion_r220525992 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +153,51 @@ object EliminateOuterJoin extends

[GitHub] spark issue #22553: [SPARK-25541][SQL] CaseInsensitiveMap should be serializ...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22553 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attri...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22326#discussion_r220518094 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +153,51 @@ object EliminateOuterJoin extends

[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attri...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22326#discussion_r220517587 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +153,51 @@ object EliminateOuterJoin extends

[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attri...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22326#discussion_r220517383 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +153,51 @@ object EliminateOuterJoin extends

[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attri...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22326#discussion_r220516732 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +153,51 @@ object EliminateOuterJoin extends

[GitHub] spark issue #22555: [SPARK-25536][CORE][Minor]metric value for METRIC_OUTPUT...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22555 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22555: [SPARK-25536][CORE][Minor]metric value for METRIC_OUTPUT...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22555 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22466: [SPARK-25464][SQL]On dropping the Database it will drop ...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22466 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #22553: [SPARK-25541][SQL] CaseInsensitiveMap should be s...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22553#discussion_r220463389 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala --- @@ -42,7 +42,11 @@ class CaseInsensitiveMap[T

[GitHub] spark issue #22466: [SPARK-25464][SQL]On dropping the Database it will drop ...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22466 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attri...

2018-09-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22326#discussion_r220436485 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1308,6 +1312,16 @@ object CheckCartesianProducts

[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attri...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22326#discussion_r220424663 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1304,10 +1307,27 @@ object CheckCartesianProducts

[GitHub] spark pull request #22543: [SPARK-23715][SQL][DOC] improve document for from...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22543#discussion_r220420622 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -1018,9 +1018,20 @@ case class TimeAdd

[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22192 why do we merge a new API to branch-2.4, when RC1 is already out? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22545: [SPARK-25525][SQL][PYSPARK] Do not update conf fo...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22545#discussion_r220414608 --- Diff: python/pyspark/sql/context.py --- @@ -485,7 +485,8 @@ def __init__(self, sparkContext, jhiveContext=None

[GitHub] spark pull request #22543: [SPARK-23715][SQL][DOC] improve document for from...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22543#discussion_r220411054 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -1018,9 +1018,20 @@ case class TimeAdd

[GitHub] spark pull request #22379: [SPARK-25393][SQL] Adding new function from_csv()

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22379#discussion_r220408893 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala --- @@ -520,7 +520,10 @@ object FunctionRegistry

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r220400809 --- Diff: docs/sql-programming-guide.md --- @@ -1877,6 +1877,10 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r220400637 --- Diff: docs/sql-programming-guide.md --- @@ -1877,6 +1877,10 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r220401132 --- Diff: docs/sql-programming-guide.md --- @@ -1877,6 +1877,10 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22237#discussion_r220401206 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -554,18 +554,30 @@ case class JsonToStructs

[GitHub] spark pull request #22540: [SPARK-24324] [PYTHON] [FOLLOW-UP] Rename the Con...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22540#discussion_r220400221 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowUtils.scala --- @@ -131,11 +131,8 @@ object ArrowUtils { } else

[GitHub] spark issue #22548: [SPARK-25534][SQL] Make `SupportWithSQLConf` trait

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22548 sure --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22010#discussion_r220399074 --- Diff: core/src/main/scala/org/apache/spark/rdd/MapPartitionsRDD.scala --- @@ -19,7 +19,7 @@ package org.apache.spark.rdd import

[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22010#discussion_r220399123 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -42,7 +42,8 @@ import org.apache.spark.partial.GroupedCountEvaluator import

[GitHub] spark issue #22521: [SPARK-24519][CORE] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIG...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22521 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #22546: [SPARK-25422][CORE] Don't memory map blocks streamed to ...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22546 thanks, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #22548: [SPARK-25534][SQL] Make `SupportWithSQLConf` trait

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22548 We have duplicated `withTempPath`, `withTempTable` too. Shall we pull them out as well? The trait name can be `SQLHelper

[GitHub] spark pull request #22484: [SPARK-25476][SPARK-25510][TEST] Refactor Aggrega...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22484#discussion_r220298090 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala --- @@ -0,0 +1,87 @@ +/* + * Licensed

[GitHub] spark issue #22521: [SPARK-24519][CORE] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIG...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22521 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #22547: [SPARK-25528][SQL] data source V2 read side API refactor...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22547 cc @rxin @jose-torres @rdblue --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attri...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22326#discussion_r220289897 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +153,56 @@ object EliminateOuterJoin extends

[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attri...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22326#discussion_r220289293 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +153,56 @@ object EliminateOuterJoin extends

[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attri...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22326#discussion_r220289231 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +153,56 @@ object EliminateOuterJoin extends

[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attri...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22326#discussion_r220288991 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +153,56 @@ object EliminateOuterJoin extends

[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22547#discussion_r220275520 --- Diff: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala --- @@ -207,13 +207,13 @@ class

[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22547#discussion_r220275173 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -106,85 +107,96 @@ private[kafka010

[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22547#discussion_r220275016 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -106,85 +107,96 @@ private[kafka010

[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22547#discussion_r220274862 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchInputStream.scala --- @@ -294,6 +227,88 @@ private

[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22547#discussion_r220274562 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousInputStream.scala --- @@ -67,28 +71,29 @@ class

[GitHub] spark pull request #22547: [SPARK-25528][SQL] data source V2 read side API r...

2018-09-25 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/22547 [SPARK-25528][SQL] data source V2 read side API refactoring ## What changes were proposed in this pull request? Refactor the read side API according to the abstraction proposed

[GitHub] spark issue #22524: [SPARK-25497][SQL] Limit operation within whole stage co...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22524 I didn't look into it, but we can change `if (count < given_limit)` to `if (count < given_limit - 1)` if you are right. My focus is the code template, without the `if else`, h

[GitHub] spark issue #22546: [SPARK-25422][CORE] Don't memory map blocks streamed to ...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22546 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22546: [SPARK-25422][CORE] Don't memory map blocks strea...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22546#discussion_r220250120 --- Diff: core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala --- @@ -175,30 +174,32 @@ object ChunkedByteBuffer { def

[GitHub] spark issue #22511: [SPARK-25422][CORE] Don't memory map blocks streamed to ...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22511 > but I don't feel confident about making that change for 2.4 Makes sense. cc @vanzin for more context about https://github.com/apache/spark/com

[GitHub] spark issue #22524: [SPARK-25497][SQL] Limit operation within whole stage co...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22524 We can always tune `if (count < given_limit)` to consume one more or less more record, isn't it? --- - To unsubscribe, e-m

[GitHub] spark issue #22524: [SPARK-25497][SQL] Limit operation within whole stage co...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22524 @viirya thanks for adding the explanation! I think it's very clear and helpful. By reading this, I have a new idea. It seems to me that limit is mostly to stop produce data earlier

[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attri...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22326#discussion_r220198104 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -152,3 +153,60 @@ object EliminateOuterJoin extends

[GitHub] spark issue #22521: [SPARK-24519][CORE] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIG...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22521 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #22543: [SPARK-23715][SQL] improve document for from/to_u...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22543#discussion_r220114639 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala --- @@ -1018,9 +1018,20 @@ case class TimeAdd

[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attri...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22326#discussion_r220110490 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1234,6 +1237,59 @@ object PushPredicateThroughJoin

[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attri...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22326#discussion_r220109700 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1234,6 +1237,59 @@ object PushPredicateThroughJoin

[GitHub] spark pull request #22326: [SPARK-25314][SQL] Fix Python UDF accessing attri...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22326#discussion_r220108380 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1234,6 +1237,59 @@ object PushPredicateThroughJoin

[GitHub] spark issue #22543: [SPARK-23715][SQL] improve document for from/to_utc_time...

2018-09-25 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22543 cc @rxin @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22543: [SPARK-23715][SQL] improve document for from/to_u...

2018-09-25 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/22543 [SPARK-23715][SQL] improve document for from/to_utc_timestamp ## What changes were proposed in this pull request? We have an agreement that the behavior of `from/to_utc_timestamp

[GitHub] spark issue #22541: [SPARK-23907][SQL] Revert regr_* functions entirely

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22541 LGTM, these functions have weird names and looks not very useful. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22466: [SPARK-25464][SQL]On dropping the Database it will drop ...

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22466 This is a behavior change and makes us different from Hive. However I can't find a strong reason to do it. It's like importing a database, but we can't automatically create table entries

[GitHub] spark issue #22542: [SPARK-25519][SQL] ArrayRemove function may return incor...

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22542 thanks, merging to master/2.4! @dilipbiswal sorry I didn't see your comment while merging. If the problem is about "implicit casting between two Map types", feel free to

[GitHub] spark issue #22466: [SPARK-25464][SQL]On dropping the Database it will drop ...

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22466 > That link says Hive does support EXTERNAL. What am I missing? Hive supports `EXTERNAL` only for tables, not databases. The CREATE TABLE syntax: ``` CREATE [TEMPOR

[GitHub] spark issue #22511: [SPARK-25422][CORE] Don't memory map blocks streamed to ...

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22511 a possible approach: can we just not dispose the data in `TorrentBroadcast`? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22511: [SPARK-25422][CORE] Don't memory map blocks streamed to ...

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22511 The analysis makes sense to me. The thing I'm not sure is, how can we hit it? The "fetch block to temp file" code path is only enabled for big bloc

[GitHub] spark issue #22494: [SPARK-25454][SQL] add a new config for picking minimum ...

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22494 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #22524: [SPARK-25497][SQL] Limit operation within whole stage co...

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22524 It will be great to explain how limit works in whole stage codegen, in general. This part is a little hard to understand

[GitHub] spark pull request #22524: [SPARK-25497][SQL] Limit operation within whole s...

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22524#discussion_r220046724 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -71,22 +71,14 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark issue #22542: [SPARK-25519][SQL] ArrayRemove function may return incor...

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22542 LGTM. The PR description has a typo: `ArrayPosition` => `ArrayRemove` --- - To unsubscribe, e-mail: reviews-unsub

[GitHub] spark pull request #22524: [SPARK-25497][SQL] Limit operation within whole s...

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22524#discussion_r220044271 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -71,22 +71,14 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark pull request #22524: [SPARK-25497][SQL] Limit operation within whole s...

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22524#discussion_r220044149 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -465,13 +465,18 @@ case class RangeExec(range

[GitHub] spark pull request #22524: [SPARK-25497][SQL] Limit operation within whole s...

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22524#discussion_r220043421 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/BufferedRowIterator.java --- @@ -73,14 +78,21 @@ public void append(InternalRow row

[GitHub] spark issue #22466: [SPARK-25464][SQL]On dropping the Database it will drop ...

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22466 yea, in Spark we conflate the two and treat a table as external if location is specified. However, Hive doesn't have external database, see: https://cwiki.apache.org/confluence/display

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22316 One safe change is to not use the `lit` function, but to do a manual pattern match and still use `Literal.apply`. We can investigate `Literal.create` in a followup

[GitHub] spark pull request #22524: [SPARK-25497][SQL] Limit operation within whole s...

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22524#discussion_r220039084 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/BufferedRowIterator.java --- @@ -38,6 +38,11 @@ protected int

[GitHub] spark issue #22466: [SPARK-25464][SQL]On dropping the Database it will drop ...

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22466 I'm not sure if there is a concept called "external database" in Hive... --- - To unsubscribe, e-mail: review

[GitHub] spark issue #22518: [SPARK-25482][SQL] ReuseSubquery can be useless when the...

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22518 > This can happen for instance when a filter containing a scalar subquery is pushed to a DataSource hmm, how can this happen? I don't think a data source can handle a filter of subqu

[GitHub] spark issue #22316: [SPARK-25048][SQL] Pivoting by multiple columns in Scala...

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22316 LGTM if the decimal precision concern from @HyukjinKwon is addressed. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22407: [SPARK-25416][SQL] ArrayPosition function may return inc...

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22407 thanks, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #22494: [SPARK-25454][SQL] add a new config for picking minimum ...

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22494 @mgaido91 Since we are going to merge this PR to both 2.4 and 2.3, and the default behavior is not changed. I think we don't need to add migration guide. I'll post this workaround to the JIRA

[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18544 thanks, merging to master/2.4! Thanks for your patience! This PR is open for over a year 😂 --- - To unsubscribe, e

<    8   9   10   11   12   13   14   15   16   17   >