Erik Forsberg created CASSANDRA-8682:
----------------------------------------

             Summary: BulkRecordWriter ends up streaming with non-unique 
session IDs on large hadoop cluster
                 Key: CASSANDRA-8682
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8682
             Project: Cassandra
          Issue Type: Bug
          Components: Hadoop
            Reporter: Erik Forsberg
         Attachments: cassandra-1.2-bulkrecordwriter-sessionid.patch

We use BulkOutputFormat extensively to load data from hadoop to Cassandra. We 
are currently running Cassandra 1.2.18, but are planning an upgrade of 
Cassandra to 2.0.X, possibly 2.1.X.

With Cassandra 1.2 we have problems with the streaming session IDs getting 
duplicated when multiple (20+) java processes start to do streaming at the same 
time. On the receiving cassandra node, having the same session ID actually 
correspond to different sending processing would confuse things a lot, leading 
to aborted connections. 

This would not happen for every process, but often enough to be a problem in 
production environment. So it was a bit tricky to test.

Suspecting this have to do with how UUIDs are generated on the sending (hadoop 
side). With 20+ processes being started concurrently, the clockSeqAndNode part 
of the uuid1 probably ended up being exactly the same on all 20 processes. 

I wrote a patch which I unfortunately never submitted at the time, but it's 
attached to this issue. The patch constructs a UUID from the map or reduce task 
ID, which is guaranteed to be unique per hadoop cluster.

I suspect we're going to face the same issue on Cassandra 2.0 and 2.1, even 
after the rewrite of the streaming subsystem. Please correct me if I'm wrong, 
i.e. if there's something in the new code that will make this a non-issue.

Now the question is how to address this problem. Possible options that I see 
after some code reading:

1. Update patch to apply on 2.0 and 2.1, using same method (generating UUID 
from hadoop task ID)
2. Modify UUIDGen code to use java process pid as clockSeq instead of random 
number. However, getting the pid in java seems less than simple (and remember 
that this is code that runs on the hadoop size of things, not inside cassandra 
daemon)
3. This patch might help:

{noformat}
diff --git a/src/java/org/apache/cassandra/utils/UUIDGen.java 
b/src/java/org/apache/cassandra/utils/UUIDGen.java
index f385744..ae253ab 100644
--- a/src/java/org/apache/cassandra/utils/UUIDGen.java
+++ b/src/java/org/apache/cassandra/utils/UUIDGen.java
@@ -234,7 +234,7 @@ public class UUIDGen
 
     private static long makeClockSeqAndNode()
     {
-        long clock = new Random(System.currentTimeMillis()).nextLong();
+        long clock = new Random().nextLong();
 
         long lsb = 0;
         lsb |= 0x8000000000000000L;                 // variant (2 bits)
{noformat}

..but I don't know the reason System.currentTimeMillis() is being used.

Opinions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to