Re: Is it possible to use an iterator to aggregate results of a BatchScanner?

William Slacum Mon, 11 Jun 2012 13:47:16 -0700

So, is a global sorting order required of your iterator? That's really
the key behavioral difference in terms of output when you're dealing
with a Scanner versus a BatchScanner.


Please correct me if I'm wrong about assuming you're trying to get a
distribution for the column families that appear in a given set of
ranges.

You can count the column qualifiers on a per tablet/row basis server
side using an Accumulo iterator, and as you iterate over your scanner,
you can merge those counts using a map.

{{{
BatchScanner scan = connector.createBatchScanner(...);
// set up a column family counting/skipping iterator

HashMap<Text, AtomicLong> cqCounts = new HashMap<Text, AtomicLong>();

for(Entry<Key, Value> e : scan) {
  AtomicLong cqCount = cqCounts.get(e.getKey().getColumnFamily());
  if(cqCount == null) {
     cqCount = new AtomicLong();
     cqCounts.put(e.getKey().getColumnFamily(), cqCount);
  }
  cqCount.addAndGet(Long.parseLong(new String(e.getValue().get()));
}
}}}

(please excuse any old/deprecated API's used)

On Mon, Jun 11, 2012 at 2:21 PM, Hunter Provyn <[email protected]> wrote:
> I have a SkippingIterator that skips entries with cq that it has seen
> before.
> It works on a Scanner, but on a BatchScanner, the iterators from different
> threads don't communicate, so the result is that results within a single
> range are unique, but across the whole set of ranges, are not unique.
> I'd prefer to perform the aggregation within the iterators if possible, but
> I don't know how.
>
> Also, thanks for your previous help, William, Keith, Bob and David.

Re: Is it possible to use an iterator to aggregate results of a BatchScanner?

Reply via email to