Hi Arthur, I'm not very clear about the usecase here. Just to clarify, in your original parquet file, do you have List<int64> typed columns?
On Wed, Nov 16, 2022 at 8:02 AM Arthur Passos <[email protected]> wrote: > Hi > > I am reading a parquet file with arrow::RecordBatchReader and the > arrow::Table returned contains columns with two chunks > (column->num_chunks() == 2). The column in question, although not limited > to, is of type Array(Int64). > > I want to extract the data (nested column data) as well as the offsets > from that column. I have found only one example > <https://github.com/apache/arrow/blob/master/cpp/examples/arrow/row_wise_conversion_example.cc#L121> > of Array columns and it assumes the nested type is known at compile time > AND the column has only one chunk. > > I have tried to loop over the Array(Int64) column chunks and grab the > `values()` member, but for some reason, for that specific Parquet file, the > values member point to the same memory location. Therefore, if I do > something like the below, I end up with duplicated data: > > static std::shared_ptr<arrow::ChunkedArray> > getNestedArrowColumn(std::shared_ptr<arrow::ChunkedArray> & arrow_column) > { arrow::ArrayVector array_vector; > array_vector.reserve(arrow_column->num_chunks()); for (size_t chunk_i = 0, > num_chunks = static_cast<size_t>(arrow_column->num_chunks()); chunk_i < > num_chunks; ++chunk_i) { arrow::ListArray & list_chunk = > dynamic_cast<arrow::ListArray &>(*(arrow_column->chunk(chunk_i))); > std::shared_ptr<arrow::Array> chunk = list_chunk.values(); > array_vector.emplace_back(std::move(chunk)); } return > std::make_shared<arrow::ChunkedArray>(array_vector); > } > > > I can provide more info, but to keep the initial request short and simple, > I'll leave it at that. > > Thanks in advance, > Arthur > -- Niranda Perera https://niranda.dev/ @n1r44 <https://twitter.com/N1R44>
