Re: BatchScanner behavior with AccumuloRowInputFormat

Christopher Wed, 30 Nov 2016 08:48:53 -0800

You'd only have to worry about this behavior if you set
RowInputFormat.setBatchScan(job, true), available since 1.7.0.
By default, our InputFormats use a regular Accumulo Scanner.


See https://issues.apache.org/jira/browse/ACCUMULO-3602 and
https://static.javadoc.io/org.apache.accumulo/accumulo-core/1.7.0/org/apache/accumulo/core/client/mapreduce/InputFormatBase.html#setBatchScan(org.apache.hadoop.mapreduce.Job,%20boolean)


On Wed, Nov 30, 2016 at 9:42 AM Massimilian Mattetti <massi...@il.ibm.com>
wrote:

Hi all,

as you already know, the AccumuloRowInputFormat is internally using a
RowIterator for iterating over all the key value pairs of a single row. In
the past when I was using the RowIterator together with a BatchScanner I
had the problem of a single row be split into multiple rows due to the fact
that a BatchScanner can interleave key-value pairs of different rows.
Should I expect the same behavior when using the AccumuloRowInputFormat
with a BatchScanner (enabled via setBatchScan)?
Thanks,
Max

Re: BatchScanner behavior with AccumuloRowInputFormat

Reply via email to