Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse

2011-01-26 Thread Jonathan Ellis
On Tue, Jan 25, 2011 at 12:09 PM, Mick Semb Wever wrote: > Well your key is a mutable Text object, so i can see some possibility > depending on how hadoop uses these objects. Yes, that's it exactly. We recently fixed a bug in the demo word_count program for this. Now we do ByteBuffer.wrap(Arrays

Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse

2011-01-26 Thread Mck
On Wed, 2011-01-26 at 12:13 +0100, Patrik Modesto wrote: > BTW how to get current time in microseconds in Java? I'm using HFactory.clock() (from hector). > > As far as moving the clone(..) into ColumnFamilyRecordWriter.write(..) > > won't this hurt performance? > > The size of the queue is comp

Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse

2011-01-26 Thread Patrik Modesto
On Wed, Jan 26, 2011 at 08:58, Mck wrote: >> You are correct that microseconds would be better but for the test it >> doesn't matter that much. > > Have you tried. I'm very new to cassandra as well, and always uncertain > as to what to expect... IMHO it's matter of use-case. In my use-case there

Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse

2011-01-25 Thread Mck
> > is "d.timestamp = System.currentTimeMillis();" ok? > > You are correct that microseconds would be better but for the test it > doesn't matter that much. Have you tried. I'm very new to cassandra as well, and always uncertain as to what to expect... > ByteBuffer bbKey = ByteBufferUtil.clo

Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse

2011-01-25 Thread Patrik Modesto
On Tue, Jan 25, 2011 at 19:09, Mick Semb Wever wrote: > In fact i have another problem (trying to write an empty byte[], or > something, as a key, which put one whole row out of whack, ((one row in > 25 million...))). > > But i'm debugging along the same code. > > I don't quite understand how the

Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse

2011-01-25 Thread Mick Semb Wever
On Tue, 2011-01-25 at 14:16 +0100, Patrik Modesto wrote: > The atttached file contains the working version with cloned key in > reduce() method. My other aproache was: > > > context.write(ByteBuffer.wrap(key.getBytes(), 0, key.getLength()), > > Collections.singletonList(getMutation(key))); > > Wh

Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse

2011-01-25 Thread Patrik Modesto
Hi Mick, attached is the very simple MR job, that deletes expired URL from my test Cassandra DB. The keyspace looks like this: Keyspace: Test: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Replication Factor: 2 Column Families: ColumnFamily: Url2 Columns sort

Re: [mapreduce] ColumnFamilyRecordWriter hidden reuse

2011-01-25 Thread Mick Semb Wever
On Tue, 2011-01-25 at 09:37 +0100, Patrik Modesto wrote: > While developing really simple MR task, I've found that a > combiantion of Hadoop optimalization and Cassandra > ColumnFamilyRecordWriter queue creates wrong keys to send to > batch_mutate(). I've seen similar behaviour (junk rows being w

[mapreduce] ColumnFamilyRecordWriter hidden reuse

2011-01-25 Thread Patrik Modesto
Hi, I play with Cassandra 0.7.0 and Hadoop, developing simple MapReduce tasks. While developing really simple MR task, I've found that a combiantion of Hadoop optimalization and Cassandra ColumnFamilyRecordWriter queue creates wrong keys to send to batch_mutate(). The proble is in the reduce part,