Having a combiner stack (more generally an iterator stack) run on the client-side seems to be the second most popular request on this list. The most popular being, "How do I write to Accumulo from inside an iterator?"
Such a thing would be very useful for me, too. I have some cycles to help out, if somebody can give me an idea of where to get started and where the potential land-mines are. -Russ On Tue, Jun 9, 2015 at 9:08 AM [email protected] < [email protected]> wrote: > Aggregated output is tiny, so if I do same calculations in memory > (instead of sending mutations to Accumulo) , I can reduce overall number of > mutations by 1000x or so > > > > -----Original Message----- > From: Josh Elser [mailto:[email protected]] > Sent: 09 June 2015 16:54 > To: [email protected] > Subject: Re: micro compaction > > Well, you win the prize for new terminology. I haven't ever heard the term > "micro compaction" before. > > Can you clarify though, you say hundreds of millions of mutations that > result in megabytes of data. Is that an increase or decrease in size. > Comparing apples to oranges :) > > [email protected] wrote: > > Hi guys, > > > > While doing pre-analytics we generate hundreds of millions of > > mutations that result in 1-100 megabytes of useful data after major > > compaction. We ingest into Accumulo using MR from Mapper job. We > > identified that performance really degrades while increasing a number of > mutations. > > > > The obvious improvement is to do some calculations in-memory before > > sending mutations to Accumulo. > > > > Of course, at the same time we are looking for a solution to minimize > > development effort. > > > > I guess I am asking about micro compaction/ingest-time iterators on > > the client side (before data is sent to Accumulo). > > > > To my understanding, Accumulo does not support them, is it correct? > > And if so, are there any plans to support this functionality in the > future? > > > > Thanks > > > > Roman > > > > Please consider the environment before printing this email. This > > message should be regarded as confidential. If you have received this > > email in error please notify the sender and destroy it immediately. > > Statements of intent shall only become binding when confirmed in hard > > copy by an authorised signatory. The contents of this email may relate > > to dealings with other companies under the control of BAE Systems > > Applied Intelligence Limited, details of which can be found at > > http://www.baesystems.com/Businesses/index.htm. > Please consider the environment before printing this email. This message > should be regarded as confidential. If you have received this email in > error please notify the sender and destroy it immediately. Statements of > intent shall only become binding when confirmed in hard copy by an > authorised signatory. The contents of this email may relate to dealings > with other companies under the control of BAE Systems Applied Intelligence > Limited, details of which can be found at > http://www.baesystems.com/Businesses/index.htm. >
