Answered on dev@: https://lists.apache.org/thread/5rpykkfoz416mq889pcpx9rwrrtjog60
In <CAJdzkC04+Uxa6bdmozPQFDkQ07M4Q=fmuhh2gvqzz-na2lm...@mail.gmail.com> "StreamReader" on Sat, 2 Jul 2022 16:04:45 +0200, L Ait <[email protected]> wrote: > Hi, > > I need help to integrate arrow cpp in my current project. In fact I built > cpp library and can call api. > > What I need is that: > > I have a c++ project that reads data by chunks then uses some erasure code > to rebuild original data. > > The rebuild is done in chunks , At each iteration I can access a buffer of > rebuilt data. > > My need is to pass this data as a stream to arrow process then send the > processed stream. > > For example if my original file is a csv and I would like to filter and > save first column: > > file > > col1,col2, col3, col3 > a1,b1,c1,d1 > an,bn,cn,dn > > split to 6 chunks of equal sizes chunk1: > > a1,b1,c1,d1 > ak,bk > > chunk2: > > ck,dk > ... > am,bm,cm,dm > > and so on. > > My question is how to use the right StreamReader in arrow and how this > deals with in complete records( lines) at the beginning and end of each > chunk ? > > Here a snippet of code I use : > buffer_type_t res = fut.get0(); > BOOST_LOG_TRIVIAL(trace) << > "RawxBackendReader: Got result with buffer size: " << res.size(); > std::shared_ptr<arrow::io::InputStream> input; > > std::shared_ptr<arrow::io::BufferReader> buffer(new arrow::io::BufferReader( > reinterpret_cast<const uint8_t*>(res.get()), res.size())); > input = buffer; > BOOST_LOG_TRIVIAL(trace) << "laa type input" << input.get(); > > ArrowFilter arrow_filter = ArrowFilter(input); > arrow_filter.ToCsv(); > > > result.push_back(std::move(res)); > > Thank you
