[GitHub] [spark] viirya commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism

2019-11-21 Thread GitBox
viirya commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism URL: https://github.com/apache/spark/pull/26461#issuecomment-557429259 For now we will take another approach to this issue. Hive scan has a few other questions

[GitHub] [spark] viirya commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism

2019-11-12 Thread GitBox
viirya commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism URL: https://github.com/apache/spark/pull/26461#issuecomment-552946268 @HyukjinKwon Thanks for comment! > Well, to me I actually agree with Dongjoon's po

[GitHub] [spark] viirya commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism

2019-11-11 Thread GitBox
viirya commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism URL: https://github.com/apache/spark/pull/26461#issuecomment-552774504 > spark.default.parallelism doesn't really affect data source scan AFAIK. We do have a s

[GitHub] [spark] viirya commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism

2019-11-11 Thread GitBox
viirya commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism URL: https://github.com/apache/spark/pull/26461#issuecomment-552643459 > For those performance reason, Apache Spark already converts Hive table to data source

[GitHub] [spark] viirya commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism

2019-11-11 Thread GitBox
viirya commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism URL: https://github.com/apache/spark/pull/26461#issuecomment-552631299 Another point is, for datasource table scan node, the parallelism can be controlled by c

[GitHub] [spark] viirya commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism

2019-11-11 Thread GitBox
viirya commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism URL: https://github.com/apache/spark/pull/26461#issuecomment-552627966 > The optimal value for each table is unknown, isn't it? This PR doesn't give any clue f

[GitHub] [spark] viirya commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism

2019-11-11 Thread GitBox
viirya commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism URL: https://github.com/apache/spark/pull/26461#issuecomment-552581081 @dongjoon-hyun Thanks for review. As I mentioned in the description, although end-

[GitHub] [spark] viirya commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism

2019-11-10 Thread GitBox
viirya commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism URL: https://github.com/apache/spark/pull/26461#issuecomment-552266990 cc @cloud-fan @dongjoon-hyun @felixcheung -