Currently the addMutation() code is synchronized, so that is a bottle neck. A thread would get around this, but then there's then you need to manage the thread properly.
On Wed, Sep 18, 2013 at 5:07 PM, Slater, David M. <[email protected]>wrote: > Hi, I’m running a single-threaded ingestion program that takes data from > an input source, parses it into mutations, and then writes those mutations > (sequentially) to four different BatchWriters (all on different tables). > Most of the time (95%) taken is on adding mutations, e.g. > batchWriter.addMutations(mutations); I am wondering how to reduce the time > taken by these methods. **** > > ** ** > > 1) For the method batchWriter.addMutations(Iterable<Mutation>), does it > matter for performance whether the mutations returned by the iterator are > sorted in lexicographic order? **** > > ** ** > > 2) If the Iterable<Mutation> that I pass to the BatchWriter is very large, > will I need to wait for a number of Batches to be written and flushed > before it will finish iterating, or does it transfer the elements of the > Iterable to a different intermediate list?**** > > ** ** > > 3) If that is the case, would it then make sense to spawn off short > threads for each time I make use of addMutations?**** > > ** ** > > At a high level, my code looks like this:**** > > ** ** > > BatchWriter bw1 = connector.createBatchWriter(…)**** > > BatchWriter bw2 = …**** > > …**** > > while(true) {**** > > String[] data = input.getData();**** > > List<Mutation> mutations1 = parseData1(data);**** > > List<Mutation> mutations2 = parseData2(data);**** > > …**** > > bw1.addMutations(mutations1);**** > > bw2.addMutations(mutations2);**** > > …**** > > }**** > > **** > > Thanks, > David**** >
