On Wed, 2011-01-26 at 12:13 +0100, Patrik Modesto wrote: > BTW how to get current time in microseconds in Java?
I'm using HFactory.clock() (from hector). > > As far as moving the clone(..) into ColumnFamilyRecordWriter.write(..) > > won't this hurt performance? > > The size of the queue is computed at runtime: > ColumnFamilyOutputFormat.QUEUE_SIZE, 32 * > Runtime.getRuntime().availableProcessors() > So the queue is not too large so I'd say the performance shouldn't get hurt. This is only the default. I'm running w/ 80000. Testing have given this the best throughput for me when processing 25+ million rows... In the end it is still 25+ million .clone(..) calls. > The key isn't the only potential live byte[]. You also have names and > values in all the columns (and supercolumns) for all the mutations. Now make that over a billion .clone(..) calls... :-( byte[] copies are relatively quick and cheap, still i am seeing a performance degradation in m/r reduce performance with cloning of keys. It's not that you don't have my vote here, i'm just stating my uncertainty on what the correct API should be. ~mck
signature.asc
Description: This is a digitally signed message part