Enis Soztutar created HBASE-12449:
-------------------------------------

             Summary: Use the max timestamp of current or old cell's timestamp 
in HRegion.append()
                 Key: HBASE-12449
                 URL: https://issues.apache.org/jira/browse/HBASE-12449
             Project: HBase
          Issue Type: Bug
            Reporter: Enis Soztutar
            Assignee: Enis Soztutar
             Fix For: 2.0.0, 0.98.8, 0.99.2


We have observed an issue in SLES clusters where the system timestamp regularly 
goes back in time. This happens frequently enough to cause test failures when 
LTT is used with updater. 

Everytime an mutation is performed, the updater creates a string in the form 
"#column:mutation_type" and appends it to the column "mutate_info". 

It seems that when the test fails, it is always the case that the mutate_info 
information for that particular column reported is not there in the column 
mutate_info. However, according to the MultiThreadedUpdater source code, if a 
row gets updated, all the columns will be mutated. So if a row contains 15 
columns, all 15 should appear in mutate_info. 

When the test fails though, we get an exception like: 
{code}
2014-11-02 04:31:12,018 ERROR [HBaseReaderThread_7] util.MultiThreadedAction: 
Error checking data for key [b0485292cde20d8a76cca37410a9f115-23787], column 
family [test_cf], column [8], mutation [null]; value of length 818
{code}

For the same row, the mutate info DOES NOT contain columns 8 (and 9) while it 
should: 
{code}
 test_cf:mutate_info                                 timestamp=1414902651388, 
value=#increment:1#0:0#1:0#10:3#11:0#12:3#13:0#14:0#15:0#16:2#2:3#3:0#4:2#5:3#6:0#7:0
                                         
{code}

Further debugging led to finding the root cause where It seems that on SUSE 
System.currentTimeMillis() can go back in time freely (especially when run in a 
virtualized env like EC2), and actually happens very frequently. 

This is from a debug log that was put in place: 
{code}
2014-11-04 01:16:05,025 INFO  
[B.DefaultRpcServer.handler=27,queue=0,port=60020] regionserver.MemStore: 
upserting: 
193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765025/Put/mvcc=8239/#increment:1
2014-11-04 01:16:05,038 INFO  
[B.DefaultRpcServer.handler=19,queue=1,port=60020] regionserver.MemStore: 
upserting: 
193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765038/Put/mvcc=8255/#increment:1#0:3
2014-11-04 01:16:05,047 INFO  
[B.DefaultRpcServer.handler=21,queue=0,port=60020] regionserver.MemStore: 
upserting: 
193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765047/Put/mvcc=8265/#increment:1#0:3#1:3
2014-11-04 01:16:05,057 INFO  
[B.DefaultRpcServer.handler=27,queue=0,port=60020] regionserver.MemStore: 
upserting: 
193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765056/Put/mvcc=8274/#increment:1#0:3#1:3#10:2
2014-11-04 01:16:05,061 INFO  [B.DefaultRpcServer.handler=6,queue=0,port=60020] 
regionserver.MemStore: upserting: 
193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765061/Put/mvcc=8278/#increment:1#0:3#1:3#10:2#11:0
2014-11-04 01:16:05,070 INFO  
[B.DefaultRpcServer.handler=20,queue=2,port=60020] regionserver.MemStore: 
upserting: 
193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765070/Put/mvcc=8285/#increment:1#0:3#1:3#10:2#11:0#12:3
2014-11-04 01:16:05,076 INFO  [B.DefaultRpcServer.handler=3,queue=0,port=60020] 
regionserver.MemStore: upserting: 
193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765076/Put/mvcc=8289/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0
2014-11-04 01:16:05,084 INFO  [B.DefaultRpcServer.handler=2,queue=2,port=60020] 
regionserver.MemStore: upserting: 
193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765084/Put/mvcc=8293/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0
2014-11-04 01:16:05,090 INFO  [B.DefaultRpcServer.handler=7,queue=1,port=60020] 
regionserver.MemStore: upserting: 
193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765090/Put/mvcc=8297/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0
2014-11-04 01:16:05,097 INFO  [B.DefaultRpcServer.handler=0,queue=0,port=60020] 
regionserver.MemStore: upserting: 
193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765097/Put/mvcc=8301/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0
2014-11-04 01:16:05,100 INFO  
[B.DefaultRpcServer.handler=14,queue=2,port=60020] regionserver.MemStore: 
upserting: 
193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765100/Put/mvcc=8303/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0
2014-11-04 01:16:05,103 INFO  
[B.DefaultRpcServer.handler=11,queue=2,port=60020] regionserver.MemStore: 
upserting: 
193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765103/Put/mvcc=8305/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0
2014-11-04 01:16:05,110 INFO  [B.DefaultRpcServer.handler=2,queue=2,port=60020] 
regionserver.MemStore: upserting: 
193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765110/Put/mvcc=8309/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2
2014-11-04 01:16:05,114 INFO  
[B.DefaultRpcServer.handler=20,queue=2,port=60020] regionserver.MemStore: 
upserting: 
193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765114/Put/mvcc=8312/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2
2014-11-04 01:16:05,119 INFO  
[B.DefaultRpcServer.handler=21,queue=0,port=60020] regionserver.MemStore: 
upserting: 
193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765119/Put/mvcc=8316/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0
2014-11-04 01:16:04,941 INFO  
[B.DefaultRpcServer.handler=22,queue=1,port=60020] regionserver.MemStore: 
upserting: 
193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063764941/Put/mvcc=8320/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0#5:0
2014-11-04 01:16:04,945 INFO  
[B.DefaultRpcServer.handler=25,queue=1,port=60020] regionserver.MemStore: 
upserting: 
193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063764945/Put/mvcc=8324/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0#6:0
2014-11-04 01:16:04,988 INFO  
[B.DefaultRpcServer.handler=19,queue=1,port=60020] regionserver.MemStore: 
upserting: 
193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063764988/Put/mvcc=8329/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0#7:3
2014-11-04 01:16:04,998 INFO  [B.DefaultRpcServer.handler=2,queue=2,port=60020] 
regionserver.MemStore: upserting: 
193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063764998/Put/mvcc=8337/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0#8:2
2014-11-04 01:16:05,005 INFO  
[B.DefaultRpcServer.handler=17,queue=2,port=60020] regionserver.MemStore: 
upserting: 
193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765005/Put/mvcc=8343/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0#9:3
{code}

Notice that we keep appending entries like {{#13:0}} to the end. Up until 
{code}
2014-11-04 01:16:05,119 INFO  
[B.DefaultRpcServer.handler=21,queue=0,port=60020] regionserver.MemStore: 
upserting: 
193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063765119/Put/mvcc=8316/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0
{code}
the timestamps are always increasing. Then the next line has a smaller 
timestamp. It still does the append though:  
{code}
2014-11-04 01:16:04,941 INFO  
[B.DefaultRpcServer.handler=22,queue=1,port=60020] regionserver.MemStore: 
upserting: 
193002e668758ea9762904da1a22337c-1268/test_cf:mutate_info/1415063764941/Put/mvcc=8320/#increment:1#0:3#1:3#10:2#11:0#12:3#13:0#14:0#15:0#16:0#17:0#18:0#2:2#3:2#4:0#5:0
{code}

The timestamp went from 1415063765119 to 1415063764941 (a 178ms difference). 
Since append code always checks for latest value (latest value in hbase sort 
order), then appends to the latest value, it does not look like an HBase level 
issue. You can even notice that the log4j timestamp is going back.

Apart from SUSE timestamp issues (and NTP setup etc), I think our append code 
should ensure that the timestamp of the new key value that is appended should 
not have a lesser timestamp than the old keyvalue. We can think about 
implementing an EnvEdgeManager to do monotonically increment the ts within the 
process, but that may be expensive. 

I think we can ensure to use the max timestamp in append anyway since that will 
ensure that even if system ts goes back, the new kv always sorts before the 
previous value. 

 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to