Answered on dev@: 
https://lists.apache.org/thread/5rpykkfoz416mq889pcpx9rwrrtjog60

In <CAJdzkC04+Uxa6bdmozPQFDkQ07M4Q=fmuhh2gvqzz-na2lm...@mail.gmail.com>
  "StreamReader" on Sat, 2 Jul 2022 16:04:45 +0200,
  L Ait <[email protected]> wrote:

> Hi,
> 
> I need help to integrate arrow cpp in my current project. In fact I built
> cpp library and can call api.
> 
> What I need is that:
> 
> I have a c++ project that reads data by chunks then uses some erasure code
> to rebuild original data.
> 
> The rebuild is done in chunks , At each iteration I can access a buffer of
> rebuilt data.
> 
> My need is to pass this data as a stream to arrow process then send the
> processed stream.
> 
> For example if my original file is a csv and I would like to filter and
> save first column:
> 
> file
> 
> col1,col2, col3, col3
> a1,b1,c1,d1
> an,bn,cn,dn
> 
> split to 6 chunks of equal sizes chunk1:
> 
> a1,b1,c1,d1
> ak,bk
> 
> chunk2:
> 
> ck,dk
> ...
> am,bm,cm,dm
> 
> and so on.
> 
> My question is how to use the right StreamReader  in arrow and how this
> deals with in complete records( lines)  at the beginning and end of each
> chunk ?
> 
> Here a snippet of code I use :
> buffer_type_t res = fut.get0();
> BOOST_LOG_TRIVIAL(trace) <<
> "RawxBackendReader: Got result with buffer size: " << res.size();
> std::shared_ptr<arrow::io::InputStream> input;
> 
> std::shared_ptr<arrow::io::BufferReader> buffer(new arrow::io::BufferReader(
> reinterpret_cast<const uint8_t*>(res.get()), res.size()));
> input = buffer;
> BOOST_LOG_TRIVIAL(trace) << "laa type input" << input.get();
> 
> ArrowFilter arrow_filter = ArrowFilter(input);
> arrow_filter.ToCsv();
> 
> 
> result.push_back(std::move(res));
> 
> Thank you

Reply via email to