Well, you win the prize for new terminology. I haven't ever heard the term "micro compaction" before.

Can you clarify though, you say hundreds of millions of mutations that result in megabytes of data. Is that an increase or decrease in size. Comparing apples to oranges :)

[email protected] wrote:
Hi guys,

While doing pre-analytics we generate hundreds of millions of mutations
that result in 1-100 megabytes of useful data after major compaction. We
ingest into Accumulo using MR from Mapper job. We identified that
performance really degrades while increasing a number of mutations.

The obvious improvement is to do some calculations in-memory before
sending mutations to Accumulo.

Of course, at the same time we are looking for a solution to minimize
development effort.

I guess I am asking about micro compaction/ingest-time iterators on the
client side (before data is sent to Accumulo).

To my understanding, Accumulo does not support them, is it correct? And
if so, are there any plans to support this functionality in the future?

Thanks

Roman

Please consider the environment before printing this email. This message
should be regarded as confidential. If you have received this email in
error please notify the sender and destroy it immediately. Statements of
intent shall only become binding when confirmed in hard copy by an
authorised signatory. The contents of this email may relate to dealings
with other companies under the control of BAE Systems Applied
Intelligence Limited, details of which can be found at
http://www.baesystems.com/Businesses/index.htm.

Reply via email to