ColumnMetaData location

2024-06-03 Thread Ed Seidl
Hi all, While investigating a parquet-java issue with the file_offset field in ColumnChunk [1] I discovered that it appears parquet java does not (and perhaps never did?) write a copy of the ColumnMetaData following the column chunk data. This IMO violates the specification[2]. Instead, parque

Re: ColumnMetaData location

2024-06-03 Thread Gang Wu
> modifying the spec to state that the ColumnMetaData following > the chunk data is also optional +1 on this > adding language to the effect that if the value of file_offset is 0, > then no such metadata is present in the file. What about marking this as deprecated and discouraged to use it? B

Re: ColumnMetaData location

2024-06-04 Thread Julien Le Dem
As far as I remember, we didn't intend to write the ColumnMetaData at the end of the Column Chunk. So this might be a case of the spec being ambiguous. Ed, are you referring to this illustration in the spec? I think here "Column 1 Chunk 1 + Column Metadata" I meant the chunk *and* its metadata but

Re: ColumnMetaData location

2024-06-04 Thread Ed Seidl
Julien, yes I'm referring to the diagram, as well as the wording that follows it:   "The file metadata contains the locations of all the column metadata start locations. More details on what is contained in the metadata can be found in the Thrift definition.   Metadata is written after the d