Re: [PR] Spark 3.5: Honor Spark conf spark.sql.files.maxPartitionBytes in read split [iceberg]

2024-09-25 Thread via GitHub
github-actions[bot] commented on PR #8922: URL: https://github.com/apache/iceberg/pull/8922#issuecomment-2375486872 This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull

Re: [PR] Spark 3.5: Honor Spark conf spark.sql.files.maxPartitionBytes in read split [iceberg]

2023-11-20 Thread via GitHub
manuzhang commented on PR #8922: URL: https://github.com/apache/iceberg/pull/8922#issuecomment-1820108319 We have similar requests to set platform level split size to reduce read RPCs to HDFS. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Spark 3.5: Honor Spark conf spark.sql.files.maxPartitionBytes in read split [iceberg]

2023-11-04 Thread via GitHub
jzhuge commented on PR #8922: URL: https://github.com/apache/iceberg/pull/8922#issuecomment-1793649132 Or have a feature flag (default false) to control this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Spark 3.5: Honor Spark conf spark.sql.files.maxPartitionBytes in read split [iceberg]

2023-11-02 Thread via GitHub
holdenk commented on PR #8922: URL: https://github.com/apache/iceberg/pull/8922#issuecomment-1791080603 What about if we made it an explicit Iceberg property rather than re-using a Spark property? -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] Spark 3.5: Honor Spark conf spark.sql.files.maxPartitionBytes in read split [iceberg]

2023-11-01 Thread via GitHub
jzhuge commented on PR #8922: URL: https://github.com/apache/iceberg/pull/8922#issuecomment-1789421862 > I am not sure it is a good idea too, we always avoided respecting Spark configs for the built-in file sources. Iceberg split planning is different. Thanks for the feedback. If you

Re: [PR] Spark 3.5: Honor Spark conf spark.sql.files.maxPartitionBytes in read split [iceberg]

2023-10-31 Thread via GitHub
aokolnychyi commented on PR #8922: URL: https://github.com/apache/iceberg/pull/8922#issuecomment-1788252194 I am not sure it is a good idea too, we always avoided respecting Spark configs for the built-in file sources. Iceberg split planning is different. -- This is an automated message

Re: [PR] Spark 3.5: Honor Spark conf spark.sql.files.maxPartitionBytes in read split [iceberg]

2023-10-29 Thread via GitHub
jzhuge commented on PR #8922: URL: https://github.com/apache/iceberg/pull/8922#issuecomment-1784547459 @rdblue When we migrate users from Hive tables to Iceberg tables, some jobs hit executor OOM, as they were tuned for Hive tables with `spark.sql.files.maxPartitionBytes`. Although we told

Re: [PR] Spark 3.5: Honor Spark conf spark.sql.files.maxPartitionBytes in read split [iceberg]

2023-10-28 Thread via GitHub
rdblue commented on PR #8922: URL: https://github.com/apache/iceberg/pull/8922#issuecomment-1783905941 @jzhuge, what is the rationale for this change? Iceberg has always ignored Spark's settings here. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Spark 3.5: Honor Spark conf spark.sql.files.maxPartitionBytes in read split [iceberg]

2023-10-27 Thread via GitHub
jzhuge commented on PR #8922: URL: https://github.com/apache/iceberg/pull/8922#issuecomment-1783584994 The PR is ready for review. If approved, we will follow up with doc update and backports to 3.4, 3.3, etc. -- This is an automated message from the Apache Git Service. To respond