pan3793 commented on PR #42336:
URL: https://github.com/apache/spark/pull/42336#issuecomment-1667169264
@wangyum I see your point, the table property takes high priority than the
spark session configuration, but that does not fully solve the problem.
The zstd promotion is gradually
pan3793 commented on PR #42336:
URL: https://github.com/apache/spark/pull/42336#issuecomment-1665860195
Let me supply my use case to help the reviewer evaluate the benefit of this
change.
Internally, most of the Spark jobs write Parquet/ORC files using Hive
Serde(obviously, for
pan3793 commented on PR #42336:
URL: https://github.com/apache/spark/pull/42336#issuecomment-1665801271
@dongjoon-hyun I understand your concerns, but as shown above, the current
filename written via Spark Hive serde is not the same as Hive does, it's half
like DS and half like Hive. So I
pan3793 commented on PR #42336:
URL: https://github.com/apache/spark/pull/42336#issuecomment-1665362216
@wangyum @yaooqinn I agree with your opinion to follow the Hive behavior as
much as possible, meanwhile, Spark also aims to reduce the difference between
DS/Hive. As you can see, the
pan3793 commented on PR #42336:
URL: https://github.com/apache/spark/pull/42336#issuecomment-1664948047
cc @wangyum @ulysses-you @yaooqinn
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the