Thanks for your response

On Mon, Jan 6, 2025 at 4:39 PM Bryce Mecum <[email protected]> wrote:
>
> Are you able to share your code, particularly how you build your
> ArrowWriterProperties?
>
> The Arrow Schema and therefore the field-level metadata is actually
> stored in the Parquet file as an opaque blob. Opaque in the sense that
> it's opaque to the standard Parquet tools. You'll have to read it in
> with a tool that's Arrow-aware such as Arrow C++ or PyArrow.

Sorry. I guess I wasn't clear. I'm doing something like this:

using SchemaPtr = std::shared_ptr<arrow::Schema>;
using ParquetSchemaPtr = std::shared_ptr<parquet::SchemaDescriptor>;
using FieldPtr = std::shared_ptr<arrow::Field>;
std::vector<FieldPtr> fields;

// Note metadata on field.
fields.push_back(arrow::field("field1", some_type, kvMetadata));
fields.push_back(arrow::field("field2", some_other_type, kvMetadata2));
...
SchemaPtr schema(new arrow::Schema(fields));

ParquetSchemaPtr parquetSchema;
parquet::arrow::ToParquetSchema(schema.get(),
*propertiesBuilder.build(), *writerProperties, &parquetSchema);
...
// Open file and write data.

What I was wanting was that the metadata information that I placed in
each of the fields that were part of the arrow schema to be written to
the parquet file. I don't see this happening. When I look at
FieldToNode() in parquet/arrow/schema.cc, it doesn't seem like the
metadata is dealt with -- I don't see anyplace on the parquet Node to
contain the metadata (I could be missing something).

> However, I believe the default behavior of the Arrow C++ Parquet
> implementation is to not store the Arrow Schema so you'll have to opt
> into that behavior to get what you want by enabling store_schema [1]
>
> [1] https://arrow.apache.org/docs/cpp/parquet.html#writetable
>
> On Mon, Jan 6, 2025 at 12:31 PM Andrew Bell <[email protected]> wrote:
> >
> > Hi,
> >
> > I'm creating a Parquet file with a writer (a FileWriter based on a
> > ParquetFileWriter). The writer is created using a Schema and the
> > Schema itself was created from a list of Fields. Each of the fields
> > contains metadata and the schema itself also contains metadata. When I
> > examine the output of the file with `parquet-tools inspect --detail`
> > it shows the Schema metadata, but no field metadata.
> >
> > I'm trying to figure out if the field metadata is being written or if
> > this is just an issue with seeing the data using the `paquet-tools`
> > program. Do I have to do something special to get metadata associated
> > with schema fields written to a parquet file? Or do I need to use some
> > other command to see field-level metadata?
> >
> > Thanks,
> >
> > --
> > Andrew Bell
> > [email protected]



-- 
Andrew Bell
[email protected]

Reply via email to