So, is a global sorting order required of your iterator? That's really
the key behavioral difference in terms of output when you're dealing
with a Scanner versus a BatchScanner.
Please correct me if I'm wrong about assuming you're trying to get a
distribution for the column families that appear in a given set of
ranges.
You can count the column qualifiers on a per tablet/row basis server
side using an Accumulo iterator, and as you iterate over your scanner,
you can merge those counts using a map.
{{{
BatchScanner scan = connector.createBatchScanner(...);
// set up a column family counting/skipping iterator
HashMap<Text, AtomicLong> cqCounts = new HashMap<Text, AtomicLong>();
for(Entry<Key, Value> e : scan) {
AtomicLong cqCount = cqCounts.get(e.getKey().getColumnFamily());
if(cqCount == null) {
cqCount = new AtomicLong();
cqCounts.put(e.getKey().getColumnFamily(), cqCount);
}
cqCount.addAndGet(Long.parseLong(new String(e.getValue().get()));
}
}}}
(please excuse any old/deprecated API's used)
On Mon, Jun 11, 2012 at 2:21 PM, Hunter Provyn <[email protected]> wrote:
> I have a SkippingIterator that skips entries with cq that it has seen
> before.
> It works on a Scanner, but on a BatchScanner, the iterators from different
> threads don't communicate, so the result is that results within a single
> range are unique, but across the whole set of ranges, are not unique.
> I'd prefer to perform the aggregation within the iterators if possible, but
> I don't know how.
>
> Also, thanks for your previous help, William, Keith, Bob and David.