Hello all, I've been toying with the registerSideChannel(iter) <https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/IteratorEnvironment.html#registerSideChannel(org.apache.accumulo.core.iterators.SortedKeyValueIterator)> method on the IteratorEnvironment passed to iterators through the init() method. >From what I can tell, the method allows you to add another iterator as a top level source, to be merged in along with other usual top-level sources such as the in-memory cache and RFiles.
Are there any downsides to using registerSideChannel( ) to "add new data" to an iterator chain? It looks like this is fairly stable, so long as the iterator we add as a side channel implements seek() properly so as to only return entries whose rows are within a tablet. I imagine it works like so: Suppose we set a custom iterator InjectIterator that registers a side channel inside init() at priority 5 as a one-time major compaction iterator. InjectIterator forwards other operations to its parent, as in WrappingIterator <https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/WrappingIterator.html>. We start the compaction: Tablet 1 (a,g] 1. init() called on InjectIterator. Creates the side channel iterator, calls init() on it, and registers it. 2. init() called on VersioningIterator. 3. init() called on top level iterators, including Rfiles, in-memory cache and the new side channel. 4. seek( (a,g] ) called on InjectIterator. 5. seek( (a,g] ) called on VersioningIterator. 6. seek( (a,g] ) called on top level iterators 7. next() called on InjectIterator. Forwards to parent. 8. next() called on VersioningIterator. Forwards to parent. 9. next() called on top level iterator (a MultiIterator <https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/system/MultiIterator.html>). The next value is read from all the top-level iterator sources and the one with the least key is cached ready to go. 10. ... Tablet 2 (g,p) --- same as tablet 1 except steps 4-6 call seek( (g,p) ). Done in parallel with tablet 1 if on a different tablet server. Is this an accurate depiction? Anything I should treat with caution? It seems to work on my single-node instance, so tips about difficulties going to multi-node are good. Code available here. <https://github.com/Accla/d4m_api_java/blob/0d8c62164d5c0b59f949ce23c1b85536809764d2/src/main/java/edu/mit/ll/graphulo/InjectIterator.java#L166> Regards, Dylan Hutchison -- www.cs.stevens.edu/~dhutchis