On Wed, 2011-01-26 at 12:13 +0100, Patrik Modesto wrote:
> BTW how to get current time in microseconds in Java?

I'm using HFactory.clock() (from hector).

> > As far as moving the clone(..) into ColumnFamilyRecordWriter.write(..)
> > won't this hurt performance? 
> 
> The size of the queue is computed at runtime:
> ColumnFamilyOutputFormat.QUEUE_SIZE, 32 *
> Runtime.getRuntime().availableProcessors()
> So the queue is not too large so I'd say the performance shouldn't get hurt. 

This is only the default.
I'm running w/ 80000. Testing have given this the best throughput for me
when processing 25+ million rows...

In the end it is still 25+ million .clone(..) calls. 

> The key isn't the only potential live byte[]. You also have names and
> values in all the columns (and supercolumns) for all the mutations.

Now make that over a billion .clone(..) calls... :-(

byte[] copies are relatively quick and cheap, still i am seeing a
performance degradation in m/r reduce performance with cloning of keys.
It's not that you don't have my vote here, i'm just stating my
uncertainty on what the correct API should be.

~mck

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to