Re: [I] How to improve write speed for data in the same partition? [iceberg]

2023-12-26 Thread via GitHub
atifiu commented on issue #9330: URL: https://github.com/apache/iceberg/issues/9330#issuecomment-1869464036 @xuchang-66 @TechTinkerer42 I have tried to test this with `write.distribution-mode = none` and it can indeed help to improve the performance but can introduce the problem of small fi

Re: [I] How to improve write speed for data in the same partition? [iceberg]

2023-12-25 Thread via GitHub
xuchang-66 commented on issue #9330: URL: https://github.com/apache/iceberg/issues/9330#issuecomment-1869222882 Thanks all, I will have a try. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] How to improve write speed for data in the same partition? [iceberg]

2023-12-25 Thread via GitHub
TechTinkerer42 commented on issue #9330: URL: https://github.com/apache/iceberg/issues/9330#issuecomment-1868957150 Each partition should be written by only one task to prevent multiple tasks from writing to the same partition, which can lead to the creation of small files. Here are

Re: [I] How to improve write speed for data in the same partition? [iceberg]

2023-12-24 Thread via GitHub
coolderli commented on issue #9330: URL: https://github.com/apache/iceberg/issues/9330#issuecomment-1868649138 try to set `write.distribution-mode` to `none`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [I] How to improve write speed for data in the same partition? [iceberg]

2023-12-22 Thread via GitHub
atifiu commented on issue #9330: URL: https://github.com/apache/iceberg/issues/9330#issuecomment-1867510715 I am also facing the same issue when writing data to single partition of iceberg table using dataframe when I write using spark.sql with insert into select * then performance is prett

[I] How to improve write speed for data in the same partition? [iceberg]

2023-12-18 Thread via GitHub
xuchang-66 opened a new issue, #9330: URL: https://github.com/apache/iceberg/issues/9330 ### Query engine spark v3.3 iceberg v1.2.2 ### Question When using Spark SQL to write to Iceberg table, only one task is used to write to each partition. However, when dealing