Hi Dan,
>
> The paths are not unique across snapshots 1/2, but within each snapshot
> they are.
This is exactly the contract I was trying to determine. It seems like it
could potentially be clarified in the specification (again apologies if I
missed it). I opened a PR to try to add language arou
I think in the situation you're demonstrating, the manifests are separated
across two separate snapshots.
Here's an example:
create table t1 (s string);
insert into t1 values ('foo'); -- snapshot 0, manifest-list with 1
manifest pointing to file A (ADDED)
insert into t1 values ('bar'); -- snapsh
Hi Dan,
Thanks for the quick reply.
> For #2, the answer follows mostly because if the answer to #1 holds, then
> yes the pairwise intersection of entries in the manifest files of a given
> snapshot is empty.
Just to be pedantic, even with unique file names. It seems one could
construct a snap
Hey Micah,
For #1, I don't believe spec clearly calls out that all data/delete files
must be unique, but the requirements for cleanup would be violated in
certain cases if you had the same file referenced in multiple manifests.
In practice, the best way to ensure data correctness and metadata
cons
At the stripe boundaries, the bytes on disk statistics are accurate. A
stripe that is in flight, is going to be an estimate, because the
dictionaries can't be compressed until the stripe is flushed. The memory
usage will be a significant over estimate, because it includes buffers that
are allocated
The following is merged for Apache ORC 1.7.4.
ORC-1123 Add `estimationMemory` method for writer
According to the Apache ORC milestone, it will be released on May 15th.
https://github.com/apache/orc/milestones
Bests,
Dongjoon.
On 2022/03/04 13:11:15 Yiqun Zhang wrote:
> Hi Openinx
>
> Thank yo
Hi Openinx
Thank you for initiating this discussion. I think we can get the
`TypeDescription` from the writer and in the `TypeDescription` we know which
types and more precisely the maximum length of the varchar/char. This will help
us to estimate the average width.
Also, I agree with your sug