I should point out, I mean that depending on what your iterators do ( and more importantly, what they store ), you may be limited by memory. It's dependent upon multiple factors, obviously.
--- Sent from my phone, may contain spelling wrrors On Mon, Jun 11, 2012 at 4:52 PM, Marc P. <[email protected]> wrote: > It may also serve you to extend the appropriate aggregator, thereby > setting your source iter to the batch scanner's iterator. You can then > iteratate over the aggregated result set ( if possible ). > > I haven't actually tried this, but you would be limited by memory at > the client ( depending on the size of your result set ). Mr. Slacum's > response wouldn't be riddled with that particular, error, however, but > you could stack the iterators in the same way the tablet servers do. > > --- > Sent from my phone, may contain spelling wrrors > > On Mon, Jun 11, 2012 at 4:46 PM, William Slacum <[email protected]> wrote: >> So, is a global sorting order required of your iterator? That's really >> the key behavioral difference in terms of output when you're dealing >> with a Scanner versus a BatchScanner. >> >> Please correct me if I'm wrong about assuming you're trying to get a >> distribution for the column families that appear in a given set of >> ranges. >> >> You can count the column qualifiers on a per tablet/row basis server >> side using an Accumulo iterator, and as you iterate over your scanner, >> you can merge those counts using a map. >> >> {{{ >> BatchScanner scan = connector.createBatchScanner(...); >> // set up a column family counting/skipping iterator >> >> HashMap<Text, AtomicLong> cqCounts = new HashMap<Text, AtomicLong>(); >> >> for(Entry<Key, Value> e : scan) { >> AtomicLong cqCount = cqCounts.get(e.getKey().getColumnFamily()); >> if(cqCount == null) { >> cqCount = new AtomicLong(); >> cqCounts.put(e.getKey().getColumnFamily(), cqCount); >> } >> cqCount.addAndGet(Long.parseLong(new String(e.getValue().get())); >> } >> }}} >> >> (please excuse any old/deprecated API's used) >> >> On Mon, Jun 11, 2012 at 2:21 PM, Hunter Provyn <[email protected]> wrote: >>> I have a SkippingIterator that skips entries with cq that it has seen >>> before. >>> It works on a Scanner, but on a BatchScanner, the iterators from different >>> threads don't communicate, so the result is that results within a single >>> range are unique, but across the whole set of ranges, are not unique. >>> I'd prefer to perform the aggregation within the iterators if possible, but >>> I don't know how. >>> >>> Also, thanks for your previous help, William, Keith, Bob and David.
