Ah, the use case of Graphulo <http://graphulo.mit.edu/>'s OneTable <https://github.com/Accla/graphulo/blob/master/src/main/java/edu/mit/ll/graphulo/Graphulo.java#L807> call. Internally the OneTable call sets up a special iterator (RemoteWriteIterator) that does open a BatchWriter. The main trick that allows it to write entries safely is pushing row/column filters into the iterator, so that the iterator controls re-seeking rather than Accumulo. This allows the iterator to write all its entries and close() without having to worry about Accumulo tearing it down. See the docs <https://github.com/Accla/graphulo/blob/master/docs/START_HERE_2016-03-28-Graphulo-UseDesign.pdf> for a starter.
*cue Josh to warn against the evils of re-purposing tablet servers for MapReduce cycles* =) Really, this is advanced stuff. Graphulo's iterators have been shown to scale up to 16 nodes for matrix multiply in the last HPEC conference, but it is possible your use case could break Accumulo, in the worst case causing deadlock if you don't use it properly. You're also free to write your own code using Graphulo's code as a starting point, if you're more comfortable with that. You may also decide on another approach such as launching a MapReduce job against Accumulo's RFiles, which could be better or worse depending on your use case. On Sat, Nov 5, 2016 at 10:28 AM, Yamini Joshi <[email protected]> wrote: > Hello all > > As per https://github.com/apache/accumulo/blob/master/docs/src/ > main/asciidoc/chapters/iterator_design.txt > " > Implementations of Iterator might be tempted to open BatchWriters inside > of an Iterator as a means > to implement triggers for writing additional data outside of their client > application. The lifecycle of an Iterator > is *not* managed in such a way that guarantees that this is safe nor > efficient. Specifically, there > is no way to guarantee that the internal ThreadPool inside of the > BatchWriter is closed (and the thread(s) > are reaped) without calling the close() method. `close`'ing and recreating > a `BatchWriter` after every > Key-Value pair is also prohibitively performance limiting to be considered > an option." > > If I need to write a subset of records generated from an iterator to a > file/table, I can't use a batch writer inside of an iterator? Is there any > other way to go about it? > > Best regards, > Yamini Joshi >
