Hi

I am reading a parquet file with arrow::RecordBatchReader and the arrow::Table 
returned contains columns with two chunks (column->num_chunks() == 2). The 
column in question, although not limited to, is of type Array(Int64).

I want to extract the data (nested column data) as well as the offsets from 
that column. I have found only one 
example<https://github.com/apache/arrow/blob/master/cpp/examples/arrow/row_wise_conversion_example.cc#L121>
 of Array columns and it assumes the nested type is known at compile time AND 
the column has only one chunk.

I have tried to loop over the Array(Int64) column chunks and grab the 
`values()` member, but for some reason, for that specific Parquet file, the 
values member point to the same memory location. Therefore, if I do something 
like the below, I end up with duplicated data:


static std::shared_ptr<arrow::ChunkedArray> 
getNestedArrowColumn(std::shared_ptr<arrow::ChunkedArray> & arrow_column)
{
    arrow::ArrayVector array_vector;
    array_vector.reserve(arrow_column->num_chunks());
    for (size_t chunk_i = 0, num_chunks = 
static_cast<size_t>(arrow_column->num_chunks()); chunk_i < num_chunks; 
++chunk_i)
      {
          arrow::ListArray & list_chunk = dynamic_cast<arrow::ListArray 
&>(*(arrow_column->chunk(chunk_i)));
          std::shared_ptr<arrow::Array> chunk = list_chunk.values();
          array_vector.emplace_back(std::move(chunk));
      }
    return std::make_shared<arrow::ChunkedArray>(array_vector);
}

I can provide more info, but to keep the initial request short and simple, I'll 
leave it at that.

Thanks in advance,
Arthur

Reply via email to