Re: [PR] Core: Avro writers use BlockingBinaryEncoder to enable array/map size calculations. [iceberg]

2024-04-26 Thread via GitHub
Fokko commented on PR #8625: URL: https://github.com/apache/iceberg/pull/8625#issuecomment-2079228113 Here we go: https://github.com/apache/avro/pull/2874 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Core: Avro writers use BlockingBinaryEncoder to enable array/map size calculations. [iceberg]

2024-04-25 Thread via GitHub
Fokko commented on PR #8625: URL: https://github.com/apache/iceberg/pull/8625#issuecomment-2078143286 Yes, this is still top of mind! I'm going to see what's needed and make sure that it will be included in the next Avro release! -- This is an automated message from the Apache Git

Re: [PR] Core: Avro writers use BlockingBinaryEncoder to enable array/map size calculations. [iceberg]

2024-04-11 Thread via GitHub
rustyconover commented on PR #8625: URL: https://github.com/apache/iceberg/pull/8625#issuecomment-2050826622 I could re-test it. It would take me a day or two. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Core: Avro writers use BlockingBinaryEncoder to enable array/map size calculations. [iceberg]

2024-04-11 Thread via GitHub
aokolnychyi commented on PR #8625: URL: https://github.com/apache/iceberg/pull/8625#issuecomment-2050785089 Curious if there were any updates as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Core: Avro writers use BlockingBinaryEncoder to enable array/map size calculations. [iceberg]

2024-03-02 Thread via GitHub
rustyconover commented on PR #8625: URL: https://github.com/apache/iceberg/pull/8625#issuecomment-1974820592 Any updates on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Core: Avro writers use BlockingBinaryEncoder to enable array/map size calculations. [iceberg]

2024-02-22 Thread via GitHub
Fokko commented on PR #8625: URL: https://github.com/apache/iceberg/pull/8625#issuecomment-1960379373 @aokolnychyi Hmm, I did some quick checks and that seems to be correct. I'm pretty sure that it was using the code because I was seeing exceptions and differences in the benchmarks. Let me

Re: [PR] Core: Avro writers use BlockingBinaryEncoder to enable array/map size calculations. [iceberg]

2024-02-22 Thread via GitHub
aokolnychyi commented on PR #8625: URL: https://github.com/apache/iceberg/pull/8625#issuecomment-1960120064 @Fokko, aren't we using `DataFileWriter` from Avro for Iceberg metadata? Yeah, I fully support the idea, it is just my preliminary analysis showed it would have no effect on

Re: [PR] Core: Avro writers use BlockingBinaryEncoder to enable array/map size calculations. [iceberg]

2024-02-12 Thread via GitHub
Fokko commented on PR #8625: URL: https://github.com/apache/iceberg/pull/8625#issuecomment-1939558946 @aokolnychyi This is about the Iceberg metadata, not about the Datafiles itself. It might also be interesting for the Datafiles, but then we should analyze the access patterns first. --

Re: [PR] Core: Avro writers use BlockingBinaryEncoder to enable array/map size calculations. [iceberg]

2024-02-11 Thread via GitHub
rustyconover commented on PR #8625: URL: https://github.com/apache/iceberg/pull/8625#issuecomment-1937932075 Hi @aokolnychyi can we change it to be a buffered binary writer that way we would get the length counts written? -- This is an automated message from the Apache Git Service. To

Re: [PR] Core: Avro writers use BlockingBinaryEncoder to enable array/map size calculations. [iceberg]

2024-02-09 Thread via GitHub
aokolnychyi commented on PR #8625: URL: https://github.com/apache/iceberg/pull/8625#issuecomment-1936377596 My point in the earlier message is that I am not sure this PR would actually have an effect because changes are not going to be used by our write path in Java. Am I missing anything

Re: [PR] Core: Avro writers use BlockingBinaryEncoder to enable array/map size calculations. [iceberg]

2023-12-16 Thread via GitHub
rustyconover commented on PR #8625: URL: https://github.com/apache/iceberg/pull/8625#issuecomment-1858921063 Hello @aokolnychyi and @Fokko, >> Question. Aren't we using DataFileWriter from Avro in our AvroFileAppender? If so, how is this PR affecting it? Won't we still use direct

Re: [PR] Core: Avro writers use BlockingBinaryEncoder to enable array/map size calculations. [iceberg]

2023-12-04 Thread via GitHub
Fokko commented on PR #8625: URL: https://github.com/apache/iceberg/pull/8625#issuecomment-1838785955 @aokolnychyi I think we can start a release somewhere soon, but I need to align this with the Avro community. I also wanted to include nanosecond timestamp in there. -- This is an

Re: [PR] Core: Avro writers use BlockingBinaryEncoder to enable array/map size calculations. [iceberg]

2023-12-01 Thread via GitHub
aokolnychyi commented on PR #8625: URL: https://github.com/apache/iceberg/pull/8625#issuecomment-1836930161 @rustyconover @Fokko, I was wondering whether there were any updates. It would be great to have this in. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Core: Avro writers use BlockingBinaryEncoder to enable array/map size calculations. [iceberg]

2023-11-15 Thread via GitHub
Fokko commented on PR #8625: URL: https://github.com/apache/iceberg/pull/8625#issuecomment-1813465479 > do we need any changes in readers to benefit from this? If not, can we run some existing benchmarks to showcase the read improvement is as we anticipate? Since we use the decoders

Re: [PR] Core: Avro writers use BlockingBinaryEncoder to enable array/map size calculations. [iceberg]

2023-11-15 Thread via GitHub
aokolnychyi commented on PR #8625: URL: https://github.com/apache/iceberg/pull/8625#issuecomment-1813448646 Also, nice work on a new encoder in Avro, @Fokko! Do you know when will that be available? -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Core: Avro writers use BlockingBinaryEncoder to enable array/map size calculations. [iceberg]

2023-11-15 Thread via GitHub
aokolnychyi commented on PR #8625: URL: https://github.com/apache/iceberg/pull/8625#issuecomment-1813420551 Question. Aren't we are using `DataFileWriter` from Avro in our `AvroFileAppender`? If so, how is this PR affecting it? Won't we still use direct encoders there?

Re: [PR] Core: Avro writers use BlockingBinaryEncoder to enable array/map size calculations. [iceberg]

2023-11-15 Thread via GitHub
aokolnychyi commented on PR #8625: URL: https://github.com/apache/iceberg/pull/8625#issuecomment-1813161570 @rustyconover @Fokko, do we need any changes in readers to benefit from this? If not, can we run some existing benchmarks to showcase the read improvement is as we anticipate?

Re: [PR] Core: Avro writers use BlockingBinaryEncoder to enable array/map size calculations. [iceberg]

2023-11-09 Thread via GitHub
aokolnychyi commented on PR #8625: URL: https://github.com/apache/iceberg/pull/8625#issuecomment-1804868833 I'd love to take a look early next week. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Core: Avro writers use BlockingBinaryEncoder to enable array/map size calculations. [iceberg]

2023-10-17 Thread via GitHub
rustyconover commented on PR #8625: URL: https://github.com/apache/iceberg/pull/8625#issuecomment-1766378641 Yes it would! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Core: Avro writers use BlockingBinaryEncoder to enable array/map size calculations. [iceberg]

2023-10-17 Thread via GitHub
Fokko commented on PR #8625: URL: https://github.com/apache/iceberg/pull/8625#issuecomment-1766306045 I just realized that this would also speed up operations snapshot expiration, because we do need to access the manifest files, but don't need to use the metrics. -- This is an automated