Re: [DISCUSS] The correct approach to estimate the byte size for an unclosed ORC writer.

2022-03-03 Thread OpenInx
te: > Thanks to openinx for opening this discussion. > > One thing to note, the current approach faces a problem, because of some > optimization mechanisms, when writing a large amount of duplicate data, > there will be some deviation between the estimated and the actual size. &g

[DISCUSS] The correct approach to estimate the byte size for an unclosed ORC writer.

2022-03-03 Thread OpenInx
Hi Iceberg dev As we all know, in our current apache iceberg write path, the ORC file writer cannot just roll over to a new file once its byte size reaches the expected threshold. The core reason that we don't support this before is: The lack of correct approach to estimate the byte size from