[GitHub] spark issue #21747: [SPARK-24165][SQL][branch-2.3] Fixing conditional expres...

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21747 To backport this fix, we need to backport another improvement PR that allows accessing SQLConf at executor side, which violates the backport rule. I think it's ok to have this fix in 2.4 only

[GitHub] spark issue #22326: [SPARK-25314][SQL] Fix Python UDF accessing attributes f...

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22326 Some thoughts: 1. This rule is a little tricky as it only handles python udf accessing attributes from both side. If it only accesses one side, we assume it can be pushed down later

[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22484 LGTM, cc @dongjoon-hyun for sign-off --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19868 Can we also update the title? ``` Avoid iterating all partition paths when spark.sql.hive.verifyPartitionPath=true ``` This is not true, we didn't fix the problem

[GitHub] spark issue #22529: [SPARK-25460][BRANCH-2.4][SS] DataSourceV2: SS sources d...

2018-09-24 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22529 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22492: [SPARK-25321][ML] Revert SPARK-14681 to avoid API breaki...

2018-09-22 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22492 The next version is very likely to be 2.5.0... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22522: [SPARK-25510][TEST] Create new trait replace BenchmarkWi...

2018-09-22 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22522 I think this change is necessary, but I'd like to migrate one benchmark to use this new trait as an example. We can migrate others in follow up PRs

[GitHub] spark pull request #22522: [SPARK-25510][TEST] Create new trait replace Benc...

2018-09-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22522#discussion_r219672131 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/RunBenchmarkWithCodegen.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed

[GitHub] spark pull request #22522: [SPARK-25510][TEST] Create new trait replace Benc...

2018-09-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22522#discussion_r219672121 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/RunBenchmarkWithCodegen.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed

[GitHub] spark issue #22407: [SPARK-25416][SQL] ArrayPosition function may return inc...

2018-09-22 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22407 LGTM except one comment --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22407: [SPARK-25416][SQL] ArrayPosition function may ret...

2018-09-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22407#discussion_r219672065 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -1045,6 +1045,36 @@ class DataFrameFunctionsSuite extends

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19868 @jinxing64 Let's not talk too much about the problem of `spark.sql.hive.verifyPartitionPath`. We should just introduce `spark.sql.files.ignoreMissingFiles` and say this can replace

[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22513 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18544 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #18544: [SPARK-21318][SQL]Improve exception message throw...

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18544#discussion_r219505379 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala --- @@ -131,14 +131,14 @@ private[sql] class HiveSessionCatalog

[GitHub] spark pull request #18544: [SPARK-21318][SQL]Improve exception message throw...

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18544#discussion_r219505121 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala --- @@ -131,14 +131,14 @@ private[sql] class HiveSessionCatalog

[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] add a new config for pickin...

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22494 sorry, I screwed up my local branch and made a wrong conclusion. Turning off the precise precision for integral literals can fix the regression. So it's better to add a config

[GitHub] spark pull request #22375: [SPARK-25388][Test][SQL] Detect incorrect nullabl...

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22375#discussion_r219452615 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -223,8 +223,16 @@ trait

[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] DECIMAL_OPERATIONS_ALLOW_PR...

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22494 Sorry my mistake. I'm talking about the specific query reported at https://issues.apache.org/jira/browse/SPARK-22036?focusedCommentId=16618104=com.atlassian.jira.plugin.system.issuetabpanels

[GitHub] spark pull request #20276: [SPARK-14948][SQL] disambiguate attributes in joi...

2018-09-21 Thread cloud-fan
Github user cloud-fan closed the pull request at: https://github.com/apache/spark/pull/20276 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20276: [SPARK-14948][SQL] disambiguate attributes in join condi...

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20276 I'm closing it since `AnalysisBarrier` is no longer there. We should revisit the whole self-join problem and fix it in 3.0, with breaking changes if necessary

[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22513 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] DECIMAL_OPERATIONS_ALLOW_PR...

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22494 I tried to add a new config, but decided to not do it. The problem is, when users hit SPARK-25454, they must turn off both the `DECIMAL_OPERATIONS_ALLOW_PREC_LOSS` and the new config

[GitHub] spark issue #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Benchmark

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22513 Please also explain which module(core or sql?) these benchmark classes should be, in the PR description. --- - To unsubscribe

[GitHub] spark pull request #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Ben...

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22513#discussion_r219414758 --- Diff: core/src/main/scala/org/apache/spark/sql/execution/benchmark/BenchmarkBase.scala --- @@ -15,7 +15,7 @@ * limitations under the License

[GitHub] spark pull request #22513: [SPARK-25499][TEST]Refactor BenchmarkBase and Ben...

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22513#discussion_r219414641 --- Diff: core/src/main/scala/org/apache/spark/sql/execution/benchmark/Benchmark.scala --- @@ -15,7 +15,7 @@ * limitations under the License

[GitHub] spark pull request #18544: [SPARK-21318][SQL]Improve exception message throw...

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18544#discussion_r219413664 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/UDFSuite.scala --- @@ -193,4 +193,29 @@ class UDFSuite

[GitHub] spark pull request #18544: [SPARK-21318][SQL]Improve exception message throw...

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18544#discussion_r219412088 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala --- @@ -1440,6 +1441,8 @@ abstract class

[GitHub] spark issue #22511: [SPARK-25422][CORE] Don't memory map blocks streamed to ...

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22511 is this a long-standing bug? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #22511: [SPARK-25422][CORE] Don't memory map blocks streamed to ...

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22511 this seems like a big change, will we hit perf regression? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22511: [SPARK-25422][CORE] Don't memory map blocks streamed to ...

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22511 cc @jiangxb1987 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22511: [SPARK-25422][CORE] Don't memory map blocks streamed to ...

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22511 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #22508: [SPARK-23549][SQL] Rename config spark.sql.legacy.compar...

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22508 LGTM, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #18544: [SPARK-21318][SQL]Improve exception message thrown by `l...

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18544 Can you explain how do we fix the problem? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22509: [SPARK-25384][SQL] Clarify fromJsonForceNullableSchema w...

2018-09-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22509 lgtm, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-09-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19868 Basically we need to introduce this new `spark.sql.files.ignoreMissingFiles` config in detail. And them explain how can we use it to replace `spark.sql.hive.verifyPartitionPath

[GitHub] spark issue #22458: [SPARK-25459] Add viewOriginalText back to CatalogTable

2018-09-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22458 LGTM. If there are more properties like `originalViewText` which are useless to Spark and only need to be displayed, I'd suggest we create a map for them, instead of adding more and more fields

[GitHub] spark issue #22505: Revert "[SPARK-23715][SQL] the input of to/from_utc_time...

2018-09-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22505 thanks, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22316#discussion_r219368724 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -416,7 +426,7 @@ class RelationalGroupedDataset protected

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22316#discussion_r219368791 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -416,7 +426,7 @@ class RelationalGroupedDataset protected

[GitHub] spark pull request #22316: [SPARK-25048][SQL] Pivoting by multiple columns i...

2018-09-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22316#discussion_r219368334 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -330,6 +331,15 @@ class RelationalGroupedDataset protected

[GitHub] spark issue #22506: [SPARK-25494][SQL] Upgrade Spark's use of Janino to 3.0....

2018-09-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22506 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...

2018-09-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22450 totally agree, thanks for looking into it! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...

2018-09-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22462 ds v2 is pre-beta, so not a critical bug to me. I'm OK to fix it in 2.4 if someone is willing to pay the effort for fixing the conflict

[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] DECIMAL_OPERATIONS_ALLOW_PR...

2018-09-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22494 why do we care about breaking operations when turning off a behavior change config? The config is prepared for this case: if a user hit a problem of the new behavior, he can use this config

[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...

2018-09-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22450 given how complex it is, I feel we can start the design at 2.5 and implement it at 3.0, what do you think

[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...

2018-09-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22450 @dilipbiswal showed that DB2 and presto treat `1e100` as double instead of decimal. We should consider this option and see what's the consequence

[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] DECIMAL_OPERATIONS_ALLOW_PR...

2018-09-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22494 This PR is a no-op if users don't turn off #20023 . The benefit of this PR is: before we fully fix that bug, if a user hit it, he can turn off #20023 to temporarily work around it. So I don't see

[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] DECIMAL_OPERATIONS_ALLOW_PR...

2018-09-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22494 2.3.2 and 2.4.0 are both in the RC stage, I don't want to spend a lot of time to fix this long-standing bug and block 2 releases. cc @jerryshao too

[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] DECIMAL_OPERATIONS_ALLOW_PR...

2018-09-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22494 yea, that's why I change the test to use `1e6 * 1000`, instead of `1000e6`. The point is, we know there is a bug and it's hard to fix. What I'm trying to do here is not fixing the bug

[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] DECIMAL_OPERATIONS_ALLOW_PR...

2018-09-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22494 > So I think we can just add a check for avoiding a non-negative scale in DecimalType.fromLiteral. Can you open a PR for it? I did some experiments locally and this seems not w

[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...

2018-09-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22450 @mgaido91 well, your long explanation makes me think we should have a design doc about it, and have more people to review it and ensure we cover all the corner cases. And seems there are more

[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] DECIMAL_OPERATIONS_ALLOW_PR...

2018-09-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22494 Maybe we should also consider to turn off DECIMAL_OPERATIONS_ALLOW_PREC_LOSS by default. --- - To unsubscribe, e-mail

[GitHub] spark issue #22494: [SPARK-22036][SQL][followup] DECIMAL_OPERATIONS_ALLOW_PR...

2018-09-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22494 cc @mgaido91 @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22494: [SPARK-22036][SQL][followup] DECIMAL_OPERATIONS_A...

2018-09-20 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/22494 [SPARK-22036][SQL][followup] DECIMAL_OPERATIONS_ALLOW_PREC_LOSS should cover all the behavior changes ## What changes were proposed in this pull request? https://github.com/apache/spark

[GitHub] spark issue #22365: [SPARK-25381][SQL] Stratified sampling by Column argumen...

2018-09-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22365 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #22492: [SPARK-25321][ML] Revert SPARK-14681 to avoid API breaki...

2018-09-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22492 Since this changes public API, shall we also revert it from master? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22408: [SPARK-25417][SQL] ArrayContains function may return inc...

2018-09-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22408 thanks, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...

2018-09-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22462 thanks, merging to master! Since the ds v2 is very different between master and 2.4, it may not worth to backport this PR to 2.4 and fix conflicts

[GitHub] spark issue #22481: Revert [SPARK-19355][SPARK-25352]

2018-09-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22481 thanks, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #22408: [SPARK-25417][SQL] ArrayContains function may return inc...

2018-09-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22408 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not respec...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22462 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22408#discussion_r219039187 --- Diff: docs/sql-programming-guide.md --- @@ -1879,6 +1879,66 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22408#discussion_r219038798 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -735,6 +735,60 @@ class DataFrameFunctionsSuite extends

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22408#discussion_r219038350 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -735,6 +735,60 @@ class DataFrameFunctionsSuite extends

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22408#discussion_r219037301 --- Diff: docs/sql-programming-guide.md --- @@ -1879,6 +1879,66 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22408#discussion_r219037194 --- Diff: docs/sql-programming-guide.md --- @@ -1879,6 +1879,66 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22408#discussion_r219037086 --- Diff: docs/sql-programming-guide.md --- @@ -1879,6 +1879,66 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark pull request #22365: [SPARK-25381][SQL] Stratified sampling by Column ...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22365#discussion_r219034294 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala --- @@ -370,29 +370,76 @@ final class DataFrameStatFunctions private

[GitHub] spark issue #22365: [SPARK-25381][SQL] Stratified sampling by Column argumen...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22365 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22480: [SPARK-25473][PYTHON][SS][TEST] ForeachWriter tests fail...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22480 +1 for adding the note --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22443: [SPARK-25339][TEST] Refactor FilterPushdownBenchmark

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22443 thanks, merging to master! @dongjoon-hyun Can you create an umbrella JIRA for updating all the benchmark and take care of it? Thanks

[GitHub] spark issue #22481: Revert [SPARK-19355][SPARK-25352]

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22481 LGTM, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22462#discussion_r219028235 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/sources/StreamingDataSourceV2Suite.scala --- @@ -143,15 +185,18 @@ class

[GitHub] spark issue #22464: Revert [SPARK-19355][SPARK-25352]

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22464 just one PR including 4 commits --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22408#discussion_r219019453 --- Diff: docs/sql-programming-guide.md --- @@ -1879,6 +1879,80 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark issue #22408: [SPARK-25417][SQL] ArrayContains function may return inc...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22408 Since it's a little risky to backport https://github.com/apache/spark/pull/22448 , I'd like to merge this PR without https://github.com/apache/spark/pull/22448 . Can you fix the conflict? thanks

[GitHub] spark pull request #22448: [SPARK-25417][SQL] Improve findTightestCommonType...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22448#discussion_r219019108 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala --- @@ -366,7 +366,29 @@ class TypeCoercionSuite

[GitHub] spark pull request #22448: [SPARK-25417][SQL] Improve findTightestCommonType...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22448#discussion_r219018725 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -106,6 +107,22 @@ object TypeCoercion

[GitHub] spark pull request #22448: [SPARK-25417][SQL] Improve findTightestCommonType...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22448#discussion_r219018069 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -89,10 +90,10 @@ object TypeCoercion

[GitHub] spark issue #22465: [SPARK-25457][SQL] IntegralDivide returns data type of t...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22465 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22464: Revert [SPARK-19355][SPARK-25352]

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22464 @viirya Thanks for doing it! To ease the review, can you revert these 4 commits sequentially with `git revert commit-hash`? Thanks! https://github.com/apache/spark/pull/22344 https

[GitHub] spark pull request #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22462#discussion_r219016643 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/sources/StreamingDataSourceV2Suite.scala --- @@ -204,6 +249,37 @@ class

[GitHub] spark issue #22469: [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN ...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22469 thanks, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #22470: [SPARK-25454][SQL] should not generate negative s...

2018-09-19 Thread cloud-fan
Github user cloud-fan closed the pull request at: https://github.com/apache/spark/pull/22470 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22470: [SPARK-25454][SQL] should not generate negative scale as...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22470 @mgaido91 you are right, this still has behavior changes if the intermedia result exceed the max precision. Since most of the storages don't support negative scale(hive, parquet, etc.), I think

[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22450 My major concern is: 1. how to prove `Divide` is the only one having problems of negative scale? 2. how to prove the fix here is corrected and covers all the corner cases? I'm

[GitHub] spark pull request #22465: [SPARK-25457][SQL] IntegralDivide returns data ty...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22465#discussion_r219011641 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1561,6 +1561,13 @@ object SQLConf { "are perf

[GitHub] spark issue #21169: [SPARK-23715][SQL] the input of to/from_utc_timestamp ca...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21169 If we don't care about backward compatibility, I'd like to remove these 2 functions as they don't make sense according to how Spark defines timestamp. This PR is a best effort to make these 2

[GitHub] spark issue #22475: [SPARK-4502][SQL] Rename to spark.sql.optimizer.nestedSc...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22475 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22469: [SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN ...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22469 LGTM, thanks for adding it! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22469: [SPARK-24341][FOLLOWUP][DOCS] Add migration note ...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22469#discussion_r218877589 --- Diff: docs/sql-programming-guide.md --- @@ -1879,6 +1879,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark issue #22402: [SPARK-25414][SS][TEST] make it clear that the numRows m...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22402 thanks, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #22470: [SPARK-25454][SQL] should not generate negative scale as...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22470 cc @mgaido91 @hvanhovell @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22450 I feel there are more places we need to fix for negative scale. I couldn't find any design doc for negative scale in Spark and I believe we supported it by accident. That said, fixing

[GitHub] spark pull request #22470: [SPARK-25454][SQL] should not generate negative s...

2018-09-19 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/22470 [SPARK-25454][SQL] should not generate negative scale as possible as we can ## What changes were proposed in this pull request? An alternative fix of https://github.com/apache/spark/pull

[GitHub] spark pull request #22450: [SPARK-25454][SQL] Avoid precision loss in divisi...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22450#discussion_r218808837 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala --- @@ -40,10 +40,13 @@ import

[GitHub] spark issue #22355: [SPARK-25358][SQL] MutableProjection supports fallback t...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22355 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22450: [SPARK-25454][SQL] Avoid precision loss in divisi...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22450#discussion_r218765559 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala --- @@ -129,16 +129,17 @@ object DecimalPrecision

[GitHub] spark pull request #22465: [SPARK-25457][SQL] IntegralDivide returns data ty...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22465#discussion_r218761706 --- Diff: sql/core/src/test/resources/sql-tests/inputs/operators.sql --- @@ -26,6 +26,15 @@ select 5 div 0; select 5 div null; select null div 5

<    9   10   11   12   13   14   15   16   17   18   >