Hi,

I need help to integrate arrow cpp in my current project. In fact I built
cpp library and can call api.

What I need is that:

I have a c++ project that reads data by chunks then uses some erasure code
to rebuild original data.

The rebuild is done in chunks , At each iteration I can access a buffer of
rebuilt data.

My need is to pass this data as a stream to arrow process then send the
processed stream.

For example if my original file is a csv and I would like to filter and
save first column:

file

col1,col2, col3, col3
a1,b1,c1,d1
an,bn,cn,dn

split to 6 chunks of equal sizes chunk1:

a1,b1,c1,d1
ak,bk

chunk2:

ck,dk
...
am,bm,cm,dm

and so on.

My question is how to use the right StreamReader  in arrow and how this
deals with in complete records( lines)  at the beginning and end of each
chunk ?

Here a snippet of code I use :
buffer_type_t res = fut.get0();
BOOST_LOG_TRIVIAL(trace) <<
"RawxBackendReader: Got result with buffer size: " << res.size();
std::shared_ptr<arrow::io::InputStream> input;

std::shared_ptr<arrow::io::BufferReader> buffer(new arrow::io::BufferReader(
reinterpret_cast<const uint8_t*>(res.get()), res.size()));
input = buffer;
BOOST_LOG_TRIVIAL(trace) << "laa type input" << input.get();

ArrowFilter arrow_filter = ArrowFilter(input);
arrow_filter.ToCsv();


result.push_back(std::move(res));

Thank you

Reply via email to