[GitHub] [spark] pan3793 commented on pull request #42336: [SPARK-44669][SQL][HIVE] Parquet/ORC files written using Hive Serde should has file extension

2023-08-06 Thread via GitHub
pan3793 commented on PR #42336: URL: https://github.com/apache/spark/pull/42336#issuecomment-1667169264 @wangyum I see your point, the table property takes high priority than the spark session configuration, but that does not fully solve the problem. The zstd promotion is gradually

[GitHub] [spark] pan3793 commented on pull request #42336: [SPARK-44669][SQL][HIVE] Parquet/ORC files written using Hive Serde should has file extension

2023-08-04 Thread via GitHub
pan3793 commented on PR #42336: URL: https://github.com/apache/spark/pull/42336#issuecomment-1665860195 Let me supply my use case to help the reviewer evaluate the benefit of this change. Internally, most of the Spark jobs write Parquet/ORC files using Hive Serde(obviously, for

[GitHub] [spark] pan3793 commented on pull request #42336: [SPARK-44669][SQL][HIVE] Parquet/ORC files written using Hive Serde should has file extension

2023-08-04 Thread via GitHub
pan3793 commented on PR #42336: URL: https://github.com/apache/spark/pull/42336#issuecomment-1665801271 @dongjoon-hyun I understand your concerns, but as shown above, the current filename written via Spark Hive serde is not the same as Hive does, it's half like DS and half like Hive. So I

[GitHub] [spark] pan3793 commented on pull request #42336: [SPARK-44669][SQL][HIVE] Parquet/ORC files written using Hive Serde should has file extension

2023-08-04 Thread via GitHub
pan3793 commented on PR #42336: URL: https://github.com/apache/spark/pull/42336#issuecomment-1665362216 @wangyum @yaooqinn I agree with your opinion to follow the Hive behavior as much as possible, meanwhile, Spark also aims to reduce the difference between DS/Hive. As you can see, the

[GitHub] [spark] pan3793 commented on pull request #42336: [SPARK-44669][SQL][HIVE] Parquet/ORC files written using Hive Serde should has file extension

2023-08-03 Thread via GitHub
pan3793 commented on PR #42336: URL: https://github.com/apache/spark/pull/42336#issuecomment-1664948047 cc @wangyum @ulysses-you @yaooqinn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the