[GitHub] spark pull request #22450: [SPARK-25454][SQL] Avoid precision loss in divisi...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22450#discussion_r218761031 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala --- @@ -129,16 +129,17 @@ object DecimalPrecision

[GitHub] spark pull request #22450: [SPARK-25454][SQL] Avoid precision loss in divisi...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22450#discussion_r218760543 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala --- @@ -129,16 +129,17 @@ object DecimalPrecision

[GitHub] spark issue #22402: [SPARK-25414][SS][TEST] make it clear that the numRows m...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22402 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #22462: [SPARK-25460][SS] DataSourceV2: SS sources do not...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22462#discussion_r218757254 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala --- @@ -47,6 +47,9 @@ class MicroBatchExecution

[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21403 ah i see. Can you add it to the migration guide? We need to tell users what will break after upgrading to 2.4 and why

[GitHub] spark pull request #22450: [SPARK-25454][SQL] Avoid precision loss in divisi...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22450#discussion_r218754608 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala --- @@ -129,16 +129,17 @@ object DecimalPrecision

[GitHub] spark issue #22454: [SPARK-25445][BUILD][FOLLOWUP] Resolve issues in release...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22454 I've tested it manually to create a scala 2.12 build and also a scala 2.12 maven package. It works well, I'm merging it to master/2.4, thanks

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-09-19 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19868 We switched to a totally different approach in the middle and forgot to update the PR description... @jinxing64 can you update it? thanks

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-09-18 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16677 I'm convinced, there are 2 major issues: 1. abusing shuffle. we need a new mechanism for driver to analyze some statistics about data (records per map task) 2. too many small tasks. We

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-09-18 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16677 Let me take an example from the PR description > For example, we have three partitions with rows (100, 100, 50) respectively. In global limit of 100 rows, we may take (34, 33, 33) r

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistics to i...

2018-09-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r218651545 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -93,25 +96,93 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark pull request #22456: [SPARK-19355][SQL] Fix variable names numberOfOut...

2018-09-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22456#discussion_r218651070 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -31,7 +31,7 @@ import org.apache.spark.util.Utils

[GitHub] spark issue #22454: [SPARK-25445][BUILD][FOLLOWUP] Resolve issues in release...

2018-09-18 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22454 thanks, the code looks much simpler! I'll test it out tomorrow --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22454: [SPARK-25445][BUILD][FOLLOWUP] Resolve issues in ...

2018-09-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22454#discussion_r218497767 --- Diff: dev/create-release/release-build.sh --- @@ -446,6 +432,8 @@ if [[ "$1" == "publish-release" ]]; then # Clean-up

[GitHub] spark pull request #22454: [SPARK-25445][BUILD][FOLLOWUP] Resolve issues in ...

2018-09-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22454#discussion_r218496294 --- Diff: dev/create-release/release-build.sh --- @@ -305,16 +290,17 @@ if [[ "$1" == "package" ]]; then for key in ${!BINA

[GitHub] spark pull request #22441: [SPARK-25445][BUILD] the release script should be...

2018-09-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22441#discussion_r218481751 --- Diff: dev/create-release/release-build.sh --- @@ -111,13 +111,17 @@ fi # different versions of Scala are supported. BASE_PROFILES="-P

[GitHub] spark pull request #22441: [SPARK-25445][BUILD] the release script should be...

2018-09-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22441#discussion_r218465898 --- Diff: dev/create-release/release-build.sh --- @@ -111,13 +111,17 @@ fi # different versions of Scala are supported. BASE_PROFILES="-P

[GitHub] spark pull request #22441: [SPARK-25445][BUILD] the release script should be...

2018-09-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22441#discussion_r218464733 --- Diff: dev/create-release/release-build.sh --- @@ -414,15 +437,15 @@ if [[ "$1" == "publish-release" ]]; then -DskipTes

[GitHub] spark issue #22441: [SPARK-25445][BUILD] the release script should be able t...

2018-09-18 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22441 Since I've verified it manually, I'm merging it to master/2.4, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22441: [SPARK-25445][BUILD] the release script should be able t...

2018-09-18 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22441 The test failure is a known issue: https://issues.apache.org/jira/browse/SPARK-25456 Actually PR builld can't verify release scripts, so we don't need to wait

[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

2018-09-18 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21403 I'm writing release notes, and this one gets my attention. @mgaido91 can you confirm that this patch doesn't introduce any behavior change? i.e. if it fails previously, it still fails

[GitHub] spark issue #22411: [SPARK-25421][SQL] Abstract an output path field in trai...

2018-09-18 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22411 We can't merge new features to maintenance branches(2.4 as well), so we don't need to rush here, as this feature can only be available in the next release

[GitHub] spark issue #22402: [SPARK-25414][SS][TEST] make it clear that the numRows m...

2018-09-18 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22402 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #22448: [SPARK-25417][SQL] Improve findTightestCommonType...

2018-09-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22448#discussion_r218322027 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -106,6 +108,22 @@ object TypeCoercion

[GitHub] spark issue #22443: [SPARK-25339][TEST] Refactor FilterPushdownBenchmark

2018-09-18 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22443 @dongjoon-hyun we are trying to avoid the overhead of scalatest, not sbt. So this LGTM --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #22448: [SPARK-25417][SQL] Improve findTightestCommonType to coe...

2018-09-18 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22448 I think it's a bug fix instead of an improvement. `findTightestCommonType` is used for binary operators and it should be easy to write some end-to-end test cases to verify the bug

[GitHub] spark pull request #22448: [SPARK-25417][SQL] Improve findTightestCommonType...

2018-09-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22448#discussion_r218320357 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -106,6 +108,22 @@ object TypeCoercion

[GitHub] spark pull request #22448: [SPARK-25417][SQL] Improve findTightestCommonType...

2018-09-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22448#discussion_r218319943 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -89,10 +91,10 @@ object TypeCoercion

[GitHub] spark issue #22441: [SPARK-25445][BUILD] the release script should be able t...

2018-09-18 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22441 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #22441: [SPARK-25445][BUILD] the release script should be...

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22441#discussion_r218304237 --- Diff: dev/create-release/release-build.sh --- @@ -111,13 +111,17 @@ fi # different versions of Scala are supported. BASE_PROFILES="-P

[GitHub] spark pull request #22441: [SPARK-25445][BUILD] the release script should be...

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22441#discussion_r218304060 --- Diff: dev/create-release/release-build.sh --- @@ -183,8 +188,17 @@ if [[ "$1" == "package" ]]; then # Updated f

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22408#discussion_r218295350 --- Diff: docs/sql-programming-guide.md --- @@ -1879,6 +1879,80 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark issue #22411: [SPARK-25421][SQL] Abstract an output path field in trai...

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22411 Since this is a new feature, we can't just merge it like #22353 without a proper design. Making the event logs as a structured, unified and reliable source for Spark metrics looks like

[GitHub] spark pull request #22443: [SPARK-25339][TEST] Refactor FilterPushdownBenchm...

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22443#discussion_r218293821 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala --- @@ -17,29 +17,28 @@ package

[GitHub] spark pull request #22443: [SPARK-25339][TEST] Refactor FilterPushdownBenchm...

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22443#discussion_r218293752 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala --- @@ -17,29 +17,28 @@ package

[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21403#discussion_r218293687 --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-basic.sql.out --- @@ -0,0 +1,70 @@ +-- Automatically generated

[GitHub] spark pull request #21403: [SPARK-24341][SQL] Support only IN subqueries wit...

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21403#discussion_r218293670 --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-basic.sql.out --- @@ -0,0 +1,70 @@ +-- Automatically generated

[GitHub] spark issue #22402: [SPARK-25414][SS][TEST] make it clear that the numRows m...

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22402 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #22438: [SPARK-25443][BUILD] fix issues when building docs with ...

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22438 thanks, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #22395: [SPARK-16323][SQL] Add IntegralDivide expression

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22395 To clarify, it's not following hive, but following the behavior of previous Spark versions, which is same as hive. I also think returning left operand's type is more reasonable, but we

[GitHub] spark issue #22411: [SPARK-25421][SQL] Abstract an output path field in trai...

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22411 Without a thorough design, I hesitate to change the `DataWritingCommand` interface only for event log. Do you have any more plans to improve the event log

[GitHub] spark issue #22438: [SPARK-25443][BUILD] fix issues when building docs with ...

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22438 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #22395: [SPARK-16323][SQL] Add IntegralDivide expression

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22395 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22395: [SPARK-16323][SQL] Add IntegralDivide expression

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22395#discussion_r218055871 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala --- @@ -314,6 +314,34 @@ case class Divide(left

[GitHub] spark issue #22411: [SPARK-25421][SQL] Abstract an output path field in trai...

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22411 This time it's not a regression right? I'd like to not change the interface, but explicitly catch the plan we are interested in `fromSparkPlan`, e.g. ``` case DataWritingCommandExec(i

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22408#discussion_r218052897 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -1331,23 +1331,27 @@ case class

[GitHub] spark pull request #22408: [SPARK-25417][SQL] ArrayContains function may ret...

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22408#discussion_r218043239 --- Diff: docs/sql-programming-guide.md --- @@ -1879,6 +1879,80 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark issue #22355: [SPARK-25358][SQL] MutableProjection supports fallback t...

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22355 So the next step is, create a `Predicate` factory with fallback? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22355: [SPARK-25358][SQL] MutableProjection supports fallback t...

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22355 LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22355: [SPARK-25358][SQL] MutableProjection supports fal...

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22355#discussion_r218041490 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGeneratorWithInterpretedFallbackSuite.scala --- @@ -69,11 +85,26

[GitHub] spark issue #22418: [SPARK-25427][SQL][TEST] Add BloomFilter creation test c...

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22418 thanks, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #22441: [SPARK-25445][BUILD] the release script should be able t...

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22441 I'm not sure if we can trigger a [packaging job](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.4-package/) with a PR , cc @shaneknapp @JoshRosen

[GitHub] spark issue #22396: [SPARK-23425][SQL][FOLLOWUP] Support wildcards in HDFS p...

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22396 thanks, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #22436: [SPARK-24654][BUILD][FOLLOWUP] Update, fix LICENSE and N...

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22436 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #22441: [SPARK-25445][BUILD] the release script should be able t...

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22441 cc @srowen @vanzin @shaneknapp @felixcheung @jerryshao --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #22441: [SPARK-25445][BUILD] the release script should be...

2018-09-17 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/22441 [SPARK-25445][BUILD] the release script should be able to publish a scala-2.12 build ## What changes were proposed in this pull request? update the package and publish steps, to support

[GitHub] spark issue #22434: [SPARK-24685][BUILD][FOLLOWUP] Fix the nonexist profile ...

2018-09-17 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22434 thanks, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #22408: [SPARK-25417][SQL] ArrayContains function may return inc...

2018-09-16 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22408 LGTM. I think the last piece is the migration guide, to explain what changed from 2.3 to 2.4 --- - To unsubscribe, e-mail

[GitHub] spark pull request #22418: [SPARK-25427][SQL][TEST] Add BloomFilter creation...

2018-09-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22418#discussion_r217949342 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala --- @@ -50,6 +55,66 @@ abstract class OrcSuite

[GitHub] spark issue #22438: [SPARK-25443][INFRA] fix issues when building docs with ...

2018-09-16 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22438 cc @vanzin @felixcheung @srowen @jerryshao --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22438: [SPARK-25443][INFRA] fix issues when building doc...

2018-09-16 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/22438 [SPARK-25443][INFRA] fix issues when building docs with release scripts in docker ## What changes were proposed in this pull request? These 2 changes are required to build the docs

[GitHub] spark issue #22436: [SPARK-24654][BUILD][FOLLOWUP] Update, fix LICENSE and N...

2018-09-16 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22436 Thanks for the fix! I'm not familiar with part though, let's ping @vanzin @felixcheung @jerryshao --- - To unsubscribe, e

[GitHub] spark pull request #22395: [SPARK-16323][SQL] Add IntegralDivide expression

2018-09-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22395#discussion_r217942351 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala --- @@ -143,16 +143,14 @@ class

[GitHub] spark issue #22432: [SPARK-22713][CORE][TEST][FOLLOWUP] Fix flaky ExternalAp...

2018-09-16 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22432 thanks, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBenchmark

2018-09-16 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21677 > So, are you heading main-method style with separate BM output files? Yes. So it's not reverting this PR, since writing BM result to a file is good. But we should update these BMs to

[GitHub] spark issue #22231: [SPARK-25238][PYTHON] lint-python: Upgrade pycodestyle t...

2018-09-16 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22231 Note that, RC1 was cut before merging this PR, which means, this patch is not available in 2.4.0. I hit some problems running the release scripts and spent quite a lot of time to fix them, so

[GitHub] spark issue #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBenchmark

2018-09-16 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21677 Seems we manually write benchmark result to a file, which can also be done with the main-method style. --- - To unsubscribe

[GitHub] spark pull request #22427: [SPARK-25438][SQL][TEST] Fix FilterPushdownBenchm...

2018-09-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22427#discussion_r217916656 --- Diff: sql/core/benchmarks/FilterPushdownBenchmark-results.txt --- @@ -2,737 +2,669 @@ Pushdown for many distinct value case

[GitHub] spark issue #22434: [SPARK-24685][BUILD][FOLLOWUP] Fix the nonexist profile ...

2018-09-16 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22434 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22432: [SPARK-22713][CORE][TEST][FOLLOWUP] Fix flaky ExternalAp...

2018-09-16 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22432 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #22432: [SPARK-22713][CORE][TEST][FOLLOWUP] Fix flaky Ext...

2018-09-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22432#discussion_r217916510 --- Diff: core/src/test/scala/org/apache/spark/util/collection/ExternalAppendOnlyMapSuite.scala --- @@ -457,7 +458,7 @@ class ExternalAppendOnlyMapSuite

[GitHub] spark issue #21515: [SPARK-24372][build] Add scripts to help with preparing ...

2018-09-15 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21515 Problem solved. This script assumes the current user can run docker command without `sudo`, while it may not be true. But we can't run this script with root user as we can't add a root user

[GitHub] spark pull request #21515: [SPARK-24372][build] Add scripts to help with pre...

2018-09-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21515#discussion_r217901993 --- Diff: dev/create-release/do-release-docker.sh --- @@ -0,0 +1,143 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software

[GitHub] spark pull request #21515: [SPARK-24372][build] Add scripts to help with pre...

2018-09-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21515#discussion_r217901953 --- Diff: dev/create-release/do-release-docker.sh --- @@ -0,0 +1,143 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software

[GitHub] spark issue #21515: [SPARK-24372][build] Add scripts to help with preparing ...

2018-09-15 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21515 I hit an issue when using this script to prepare 2.4.0 rc1 ``` useradd: UID 0 is not unique The command '/bin/sh -c useradd -m -s /bin/bash -p spark-rm -u $UID spark-rm' returned a non

[GitHub] spark pull request #21515: [SPARK-24372][build] Add scripts to help with pre...

2018-09-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21515#discussion_r217900617 --- Diff: dev/create-release/do-release-docker.sh --- @@ -0,0 +1,143 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software

[GitHub] spark pull request #22396: [SPARK-23425][SQL][FOLLOWUP] Support wildcards in...

2018-09-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22396#discussion_r217758371 --- Diff: docs/sql-programming-guide.md --- @@ -1898,6 +1898,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark issue #22396: [SPARK-23425][SQL][FOLLOWUP] Support wildcards in HDFS p...

2018-09-14 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22396 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22413: [SPARK-25425][SQL] Extra options overwrite session optio...

2018-09-14 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22413 good catch! I believe this was a mistake. LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22408: [SPARK-25417][SQL] ArrayContains function may return inc...

2018-09-14 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22408 what does presto return for `select array_contains(array(1), 1.34);`? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22395: [SPARK-16323][SQL] Add IntegralDivide expression

2018-09-14 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22395 LGTM except one comment --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22395: [SPARK-16323][SQL] Add IntegralDivide expression

2018-09-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22395#discussion_r217743678 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala --- @@ -314,6 +314,32 @@ case class Divide(left

[GitHub] spark issue #21677: [SPARK-24692][TESTS] Improvement FilterPushdownBenchmark

2018-09-14 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21677 I need more context here. What's the benefit of the test suite style benchmark? I've committed benchmark code several times and I always use the main-method style. One benefit of the main-method

[GitHub] spark pull request #22410: [SPARK-25418][SQL] The metadata of DataSource tab...

2018-09-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22410#discussion_r217669057 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -806,6 +807,8 @@ private[spark] class HiveExternalCatalog

[GitHub] spark pull request #22402: [SPARK-25414][SS] The numInputRows metrics can be...

2018-09-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22402#discussion_r217576877 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala --- @@ -460,9 +460,9 @@ class StreamingQuerySuite extends

[GitHub] spark pull request #22395: [SPARK-16323][SQL] Add IntegralDivide expression

2018-09-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22395#discussion_r217576598 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala --- @@ -314,6 +314,27 @@ case class Divide(left

[GitHub] spark pull request #22402: [SPARK-25414][SS] The numInputRows metrics can be...

2018-09-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22402#discussion_r217286827 --- Diff: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala --- @@ -598,18 +599,20 @@ abstract class

[GitHub] spark issue #22409: [SPARK-25352][SQL][Followup] Add helper method and addre...

2018-09-13 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22409 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #22365: [SPARK-25381][SQL] Stratified sampling by Column ...

2018-09-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22365#discussion_r217256279 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala --- @@ -370,29 +370,76 @@ final class DataFrameStatFunctions private

[GitHub] spark pull request #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS

2018-09-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20521#discussion_r217255313 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala --- @@ -56,34 +57,36 @@ case class

[GitHub] spark issue #22402: [SPARK-25414][SS] The numInputRows metrics can be incorr...

2018-09-12 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22402 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #22355: [SPARK-25358][SQL] MutableProjection supports fal...

2018-09-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22355#discussion_r217241810 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CodeGeneratorWithInterpretedFallback.scala --- @@ -37,19 +37,22

[GitHub] spark issue #22353: [SPARK-25357][SQL] Add metadata to SparkPlanInfo to dump...

2018-09-12 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22353 thanks, merging to master/2.4/2.3! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark issue #22374: [SPARK-25387][SQL] Fix for NPE caused by bad CSV input

2018-09-12 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22374 thanks, merging to master/2.4! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #22398: [SPARK-23820][CORE] Enable use of long form of callsite ...

2018-09-12 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22398 since the original PR was reverted from 2.4, I'm merging it back. Thanks, merging to master/2.4! --- - To unsubscribe

[GitHub] spark issue #22401: [SPARK-25413] Precision value is going for toss when Avg...

2018-09-12 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22401 According to the PR introduced the `+ 10` behavior, it said this follows Hive. Whatever we want to propose, let's clearly write down the tradeoffs. e.g. maybe too keep larger precision

[GitHub] spark pull request #22355: [SPARK-25358][SQL] MutableProjection supports fal...

2018-09-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22355#discussion_r217233868 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CodeGeneratorWithInterpretedFallback.scala --- @@ -37,19 +37,22

[GitHub] spark pull request #22395: [SPARK-16323][SQL] Add IntegralDivide expression

2018-09-12 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22395#discussion_r217233033 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala --- @@ -314,6 +314,27 @@ case class Divide(left

[GitHub] spark issue #22398: [SPARK-23820][CORE] Enable use of long form of callsite ...

2018-09-12 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22398 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22388: Revert [SPARK-24882][SQL] improve data source v2 API fro...

2018-09-12 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22388 As we discussed in the dev list, we only want to revert it from 2.4. I'm closing it now. --- - To unsubscribe, e-mail

<    10   11   12   13   14   15   16   17   18   19   >