viirya commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned
table should not dramatically increase data parallelism
URL: https://github.com/apache/spark/pull/26461#issuecomment-557429259
For now we will take another approach to this issue. Hive scan has a few
other questions
viirya commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned
table should not dramatically increase data parallelism
URL: https://github.com/apache/spark/pull/26461#issuecomment-552946268
@HyukjinKwon Thanks for comment!
> Well, to me I actually agree with Dongjoon's po
viirya commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned
table should not dramatically increase data parallelism
URL: https://github.com/apache/spark/pull/26461#issuecomment-552774504
> spark.default.parallelism doesn't really affect data source scan AFAIK. We
do have a s
viirya commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned
table should not dramatically increase data parallelism
URL: https://github.com/apache/spark/pull/26461#issuecomment-552643459
> For those performance reason, Apache Spark already converts Hive table to
data source
viirya commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned
table should not dramatically increase data parallelism
URL: https://github.com/apache/spark/pull/26461#issuecomment-552631299
Another point is, for datasource table scan node, the parallelism can be
controlled by c
viirya commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned
table should not dramatically increase data parallelism
URL: https://github.com/apache/spark/pull/26461#issuecomment-552627966
> The optimal value for each table is unknown, isn't it? This PR doesn't
give any clue f
viirya commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned
table should not dramatically increase data parallelism
URL: https://github.com/apache/spark/pull/26461#issuecomment-552581081
@dongjoon-hyun Thanks for review.
As I mentioned in the description, although end-
viirya commented on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned
table should not dramatically increase data parallelism
URL: https://github.com/apache/spark/pull/26461#issuecomment-552266990
cc @cloud-fan @dongjoon-hyun @felixcheung
-