Hi,

After step 4 ( i.e disabling of WAL encryption, removing 
SecureProtobufReader/Writer and restart), read of encrypted WAL fails mainly 
due to EOF exception at Basedecoder. This is not considered as error and these 
WAL are being moved to /oldWALs.

Following is observed in log files:
2014-07-30 19:44:29,254 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1] 
wal.HLogSplitter: Splitting hlog: 
hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017,
 length=172
2014-07-30 19:44:29,254 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1] 
wal.HLogSplitter: DistributedLogReplay = false
2014-07-30 19:44:29,313 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1] 
util.FSHDFSUtils: Recovering lease on dfs file 
hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017
2014-07-30 19:44:29,315 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1] 
util.FSHDFSUtils: recoverLease=true, attempt=0 on 
file=hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017
 after 1ms
2014-07-30 19:44:29,429 DEBUG [RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-0] 
wal.HLogSplitter: Writer thread 
Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-0,5,main]: starting
2014-07-30 19:44:29,429 DEBUG [RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-1] 
wal.HLogSplitter: Writer thread 
Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-1,5,main]: starting
2014-07-30 19:44:29,430 DEBUG [RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-2] 
wal.HLogSplitter: Writer thread 
Thread[RS_LOG_REPLAY_OPS-HOST-16:15264-1-Writer-2,5,main]: starting
2014-07-30 19:44:29,591 ERROR [RS_LOG_REPLAY_OPS-HOST-16:15264-1] 
codec.BaseDecoder: Partial cell read caused by EOF: java.io.IOException: 
Premature EOF from inputStream
2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1] 
wal.HLogSplitter: Finishing writing output logs and closing down.
2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1] 
wal.HLogSplitter: Waiting for split writer threads to finish
2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1] 
wal.HLogSplitter: Split writers finished
2014-07-30 19:44:29,592 INFO  [RS_LOG_REPLAY_OPS-HOST-16:15264-1] 
wal.HLogSplitter: Processed 0 edits across 0 regions; log 
file=hdfs://HOST-16:18020/hbase/WALs/HOST-16,15264,1406725441997-splitting/HOST-16%2C15264%2C1406725441997.1406725444017
 is corrupted = false progress failed = false

To fix this, we need to propagate EOF exception to HLogSplitter. Any 
suggestions on the fix?


Regards,
Kiran
__________________________________________________________________________________________________________
This e-mail and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!
__________________________________________________________________________________________________________




> -----Original Message-----
> From: Shankar hiremath [mailto:shankar.hirem...@huawei.com]
> Sent: Friday, July 25, 2014 19:38
> To: user@hbase.apache.org
> Subject: HBase file encryption, inconsistencies observed and data loss
> 
> HBase file encryption some inconsistencies observed and data loss
> happens after running the hbck tool,
> the operation steps are as below.    (one thing what I observed is, on
> startup of HMaster if it is not able to process the WAL file, then also
> it moved to /oldWALs)
> 
> Procedure:
> 1. Start the Hbase services (HMaster & region Server) 2. Enable HFile
> encryption and WAL file encryption as below, and perform 'table4-0' put
> operations (100 records added) <property>
> <name>hbase.crypto.keyprovider</name>
> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> </property>
> <property>
> <name>hbase.crypto.keyprovider.parameters</name>
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</
> value>
> </property>
> <property>
> <name>hbase.crypto.master.key.name</name>
> <value>hdfs</value>
> </property>
> <property>
> <name>hfile.format.version</name>
> <value>3</value>
> </property>
> <property>
> <name>hbase.regionserver.hlog.reader.impl</name>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader
> </value>
> </property>
> <property>
> <name>hbase.regionserver.hlog.writer.impl</name>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter
> </value>
> </property>
> <property>
> <name>hbase.regionserver.wal.encryption</name>
> <value>true</value>
> </property>
> 3. Machine went down, so all process went down
> 
> 4. We disabled the WAL file encryption for performance reason, and keep
> encryption only for Hfile, as below <property>
> <name>hbase.crypto.keyprovider</name>
> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> </property>
> <property>
> <name>hbase.crypto.keyprovider.parameters</name>
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</
> value>
> </property>
> <property>
> <name>hbase.crypto.master.key.name</name>
> <value>hdfs</value>
> </property>
> <property>
> <name>hfile.format.version</name>
> <value>3</value>
> </property>
> 5. Start the Region Server and query the 'table4-0' data
> hbase(main):003:0> count 'table4-0'
> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
> table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not online
> on XX-XX-XX-XX,60020,1406209023146 at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedNa
> me(HRegionServer.java:2685)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
> ver.java:4119)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.j
> ava:3066)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2
> .callBlockingMethod(ClientProtos.java:29497)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcSc
> heduler.java:168)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSche
> duler.java:39)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler
> .java:111)
> at java.lang.Thread.run(Thread.java:662)
> 6. Not able to read the data, so we decided to revert back the
> configuration (as original) 7. Kill/Stop the Region Server, revert all
> the configurations as original, as below <property>
> <name>hbase.crypto.keyprovider</name>
> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> </property>
> <property>
> <name>hbase.crypto.keyprovider.parameters</name>
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</
> value>
> </property>
> <property>
> <name>hbase.crypto.master.key.name</name>
> <value>hdfs</value>
> </property>
> <property>
> <name>hfile.format.version</name>
> <value>3</value>
> </property>
> <property>
> <name>hbase.regionserver.hlog.reader.impl</name>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader
> </value>
> </property>
> <property>
> <name>hbase.regionserver.hlog.writer.impl</name>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter
> </value>
> </property>
> <property>
> <name>hbase.regionserver.wal.encryption</name>
> <value>true</value>
> </property>
> 7. Start the Region Server, and perform the 'table4-0' query
> hbase(main):003:0> count 'table4-0'
> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
> table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not online
> on XX-XX-XX-XX,60020,1406209023146 at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedNa
> me(HRegionServer.java:2685)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSer
> ver.java:4119)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.j
> ava:3066)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2
> .callBlockingMethod(ClientProtos.java:29497)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcSc
> heduler.java:168)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSche
> duler.java:39)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler
> .java:111)
> at java.lang.Thread.run(Thread.java:662)
> 8. Run the hbase hbck to repair, as below ./hbase hbck -
> details .........................
> Summary:
> table1-0 is okay.
> Number of regions: 0
> Deployed on:
> table2-0 is okay.
> Number of regions: 0
> Deployed on:
> table3-0 is okay.
> Number of regions: 0
> Deployed on:
> table4-0 is okay.
> Number of regions: 0
> Deployed on:
> table5-0 is okay.
> Number of regions: 0
> Deployed on:
> table6-0 is okay.
> Number of regions: 0
> Deployed on:
> table7-0 is okay.
> Number of regions: 0
> Deployed on:
> table8-0 is okay.
> Number of regions: 0
> Deployed on:
> table9-0 is okay.
> Number of regions: 0
> Deployed on:
> hbase:meta is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> Number of regions: 0
> Deployed on:
> hbase:namespace is okay.
> Number of regions: 0
> Deployed on:
> 22 inconsistencies detected.
> Status: INCONSISTENT
> 2014-07-24 19:13:05,532 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing master
> protocol: MasterService
> 2014-07-24 19:13:05,533 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
> sessionid=0x1475d1611611bcf
> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing
> session: 0x1475d1611611bcf
> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing
> client for session: 0x1475d1611611bcf
> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: Reading reply sessionid:0x1475d1611611bcf,
> packet:: clientPath:null serverPath:null finished:false header:: 6,-11
> replyHeader:: 6,4295102074,0 request:: null response:: null
> 2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn:
> Disconnecting client for session: 0x1475d1611611bcf
> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: An exception was thrown while closing send thread
> for session 0x1475d1611611bcf : Unable to read additional data from
> server sessionid 0x1475d1611611bcf, likely server has closed socket
> 2014-07-24 19:13:05,546 INFO [main-EventThread] zookeeper.ClientCnxn:
> EventThread shut down
> 2014-07-24 19:13:05,546 INFO [main] zookeeper.ZooKeeper: Session:
> 0x1475d1611611bcf closed shankar1@XX-XX-XX-XX:~/DataSight/hbase/bin>
> 9. Fix the assignments as below
> ./hbase hbck -fixAssignments
> Summary:
> table1-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table2-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table3-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table4-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table5-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table6-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table7-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table8-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table9-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> 0 inconsistencies detected.
> Status: OK
> 2014-07-24 19:44:55,194 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing master
> protocol: MasterService
> 2014-07-24 19:44:55,194 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
> sessionid=0x2475d15f7b31b73
> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing
> session: 0x2475d15f7b31b73
> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing
> client for session: 0x2475d15f7b31b73
> 2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: Reading reply sessionid:0x2475d15f7b31b73,
> packet:: clientPath:null serverPath:null finished:false header:: 7,-11
> replyHeader:: 7,4295102377,0 request:: null response:: null
> 2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn:
> Disconnecting client for session: 0x2475d15f7b31b73
> 2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: An exception was thrown while closing send thread
> for session 0x2475d15f7b31b73 : Unable to read additional data from
> server sessionid 0x2475d15f7b31b73, likely server has closed socket
> 2014-07-24 19:44:55,204 INFO [main] zookeeper.ZooKeeper: Session:
> 0x2475d15f7b31b73 closed
> 2014-07-24 19:44:55,204 INFO [main-EventThread] zookeeper.ClientCnxn:
> EventThread shut down 10. Fix the assignments as below ./hbase hbck -
> fixAssignments -fixMeta
> Summary:
> table1-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table2-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table3-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table4-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table5-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table6-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table7-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table8-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table9-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> 0 inconsistencies detected.
> Status: OK
> 2014-07-24 19:46:16,290 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing master
> protocol: MasterService
> 2014-07-24 19:46:16,290 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
> sessionid=0x3475d1605321be9
> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing
> session: 0x3475d1605321be9
> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing
> client for session: 0x3475d1605321be9
> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: Reading reply sessionid:0x3475d1605321be9,
> packet:: clientPath:null serverPath:null finished:false header:: 6,-11
> replyHeader:: 6,4295102397,0 request:: null response:: null
> 2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn:
> Disconnecting client for session: 0x3475d1605321be9
> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: An exception was thrown while closing send thread
> for session 0x3475d1605321be9 : Unable to read additional data from
> server sessionid 0x3475d1605321be9, likely server has closed socket
> 2014-07-24 19:46:16,300 INFO [main] zookeeper.ZooKeeper: Session:
> 0x3475d1605321be9 closed
> 2014-07-24 19:46:16,300 INFO [main-EventThread] zookeeper.ClientCnxn:
> EventThread shut down hbase(main):006:0> count 'table4-0'
> 0 row(s) in 0.0200 seconds
> => 0
> hbase(main):007:0>
> Complete data loss happened,
> WALs, oldWALs & /hbase/data/default/table4-0/ does not have any data
> 
> 
> 
> [X]
> This e-mail and its attachments contain confidential information from
> HUAWEI, which is intended only for the person or entity whose address
> is listed above. Any use of the information contained herein in any way
> (including, but not limited to, total or partial disclosure,
> reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please
> notify the sender by phone or email immediately and delete it!
> [X]
> 
> 
> 
> 

Reply via email to