I see, so the only solution here would be either to use a WholeRowIterator or to avoid enabling the BatchScanner. Since each executor will work on a single tablet I guess that the benefit of using a BatchScanner is that it can fetch multiple ranges over the same tablet in parallel, am I correct? Thanks, Max
From: Christopher <[email protected]> To: [email protected] Date: 30/11/2016 18:48 Subject: Re: BatchScanner behavior with AccumuloRowInputFormat You'd only have to worry about this behavior if you set RowInputFormat.setBatchScan(job, true), available since 1.7.0. By default, our InputFormats use a regular Accumulo Scanner. See https://issues.apache.org/jira/browse/ACCUMULO-3602 and https://static.javadoc.io/org.apache.accumulo/accumulo-core/1.7.0/org/apache/accumulo/core/client/mapreduce/InputFormatBase.html#setBatchScan(org.apache.hadoop.mapreduce.Job,%20boolean) On Wed, Nov 30, 2016 at 9:42 AM Massimilian Mattetti <[email protected]> wrote: Hi all, as you already know, the AccumuloRowInputFormat is internally using a RowIterator for iterating over all the key value pairs of a single row. In the past when I was using the RowIterator together with a BatchScanner I had the problem of a single row be split into multiple rows due to the fact that a BatchScanner can interleave key-value pairs of different rows. Should I expect the same behavior when using the AccumuloRowInputFormat with a BatchScanner (enabled via setBatchScan)? Thanks, Max
