Re: BatchScanner behavior with AccumuloRowInputFormat

Massimilian Mattetti Wed, 30 Nov 2016 22:52:34 -0800

I see, so the only solution here would be either to use a WholeRowIterator 
or to avoid enabling the BatchScanner. Since each executor will work on a 
single tablet I guess that the benefit of using a BatchScanner is that it 
can fetch multiple ranges over the same tablet in parallel, am I correct? 
Thanks,
Max

From:   Christopher <[email protected]>
To:     [email protected]
Date:   30/11/2016 18:48
Subject:        Re: BatchScanner behavior with AccumuloRowInputFormat

You'd only have to worry about this behavior if you set 
RowInputFormat.setBatchScan(job, true), available since 1.7.0.
By default, our InputFormats use a regular Accumulo Scanner.

See https://issues.apache.org/jira/browse/ACCUMULO-3602 and 
https://static.javadoc.io/org.apache.accumulo/accumulo-core/1.7.0/org/apache/accumulo/core/client/mapreduce/InputFormatBase.html#setBatchScan(org.apache.hadoop.mapreduce.Job,%20boolean)

On Wed, Nov 30, 2016 at 9:42 AM Massimilian Mattetti <[email protected]> 
wrote:
Hi all,

as you already know, the AccumuloRowInputFormat is internally using a 
RowIterator for iterating over all the key value pairs of a single row. In 
the past when I was using the RowIterator together with a BatchScanner I 
had the problem of a single row be split into multiple rows due to the 
fact that a BatchScanner can interleave key-value pairs of different rows. 
Should I expect the same behavior when using the AccumuloRowInputFormat 
with a BatchScanner (enabled via setBatchScan)?
Thanks,
Max

Re: BatchScanner behavior with AccumuloRowInputFormat

Reply via email to