Hi Arun,
The schema should be `parquet::Repetition:OPTIONAL`,
parquet::Repetition:REPEATED
should be for repeated groups.  IIRC you can insert
arrow::util::Optional::nullopt into the stream for a null value.

Hope this helps.

Micah

On Tue, Sep 13, 2022 at 8:58 AM Arun Joseph <[email protected]> wrote:

> Hi all,
>
> I've tried defining my field with the following:
>
> fields.push_back(
>   parquet::schema::PrimitiveNode::Make(
>     "field_name",
>     parquet::Repetition::REQUIRED,
>     parquet::Type::INT64,
>     parquet::ConvertedType::INT_64)
> );
>
> and I'm not sure if it's possible to specify a null value for an int64
> column. I understand that C++ ints don't have a null value. I write to the
> field with the following:
>
> os << std::numeric_limits<int64_t>::quiet_NaN();
>
> where os is:
>
> parquet::WriterProperties::Builder builder_;
> parquet::StreamWriter os {parquet::ParquetFileWriter::Open(outfile_,
> schema_, builder_.build())};
>
> This (as expected) writes a 0 for the value. But is there a way to specify
> a null value? From my understanding parquet::Repetition:OPTIONAL is meant
> for repeating groups.
>
> My actual usecase is trying to represent a null linux epoch timestamp in
> nanos e.g. NaN or NaT in the resulting pandas dataframe after reading the
> written parquet file. It seems like in Pandas, int columns with nulls are
> implicitly casted to float but I think parquet is able to define a null
> value like this. Is this the only way to achieve this to convert the
> column to a float or is there a way to specify value is null in parquet
> cpp?
>
> Thank You,
> Arun Joseph
>
>

Reply via email to