[ https://issues.apache.org/jira/browse/HBASE-11584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073303#comment-14073303 ]
Anoop Sam John commented on HBASE-11584: ---------------------------------------- In Step#4, instead of removing these configs, can you make this way and check once? {code} <property> <name>hbase.regionserver.hlog.reader.impl</name> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value> </property> <property> <name>hbase.regionserver.hlog.writer.impl</name> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value> </property> <property> <name>hbase.regionserver.wal.encryption</name> <value>false</value> </property> {code} Still WAL encryption will be disabled for new writes. > HBase file encryption, consistences observed and data loss > ---------------------------------------------------------- > > Key: HBASE-11584 > URL: https://issues.apache.org/jira/browse/HBASE-11584 > Project: HBase > Issue Type: Bug > Components: hbck, HFile > Affects Versions: 0.98.3 > Environment: SuSE 11 SP3 > Reporter: shankarlingayya > Priority: Critical > > HBase file encryption some consistences observed and data loss happens after > running the hbck tool, > the operation steps are as below. > Procedure: > 1. Start the Hbase services (HMaster & region Server) > 2. Enable HFile encryption and WAL file encryption as below, and perform > 'table4-0' put operations (100 records added) > <property> > <name>hbase.crypto.keyprovider</name> > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value> > </property> > <property> > <name>hbase.crypto.keyprovider.parameters</name> > <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</value> > </property> > <property> > <name>hbase.crypto.master.key.name</name> > <value>hdfs</value> > </property> > <property> > <name>hfile.format.version</name> > <value>3</value> > </property> > <property> > <name>hbase.regionserver.hlog.reader.impl</name> > > <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value> > </property> > <property> > <name>hbase.regionserver.hlog.writer.impl</name> > > <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value> > </property> > <property> > <name>hbase.regionserver.wal.encryption</name> > <value>true</value> > </property> > > 3. Machine went down, so all process went down > 4. We disabled the WAL file encryption for performance reason, and keep > encryption only for Hfile, as below > <property> > <name>hbase.crypto.keyprovider</name> > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value> > </property> > <property> > <name>hbase.crypto.keyprovider.parameters</name> > <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</value> > </property> > <property> > <name>hbase.crypto.master.key.name</name> > <value>hdfs</value> > </property> > <property> > <name>hfile.format.version</name> > <value>3</value> > </property> > 5. Start the Region Server and query the 'table4-0' data > hbase(main):003:0> count 'table4-0' > ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region > table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not online on > XX-XX-XX-XX,60020,1406209023146 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2685) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4119) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3066) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:168) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:39) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:111) > at java.lang.Thread.run(Thread.java:662) > 6. Not able to read the data, so we decided to revert back the configuration > (as original) > 7. Kill/Stop the Region Server, revert all the configurations as original, as > below > <property> > <name>hbase.crypto.keyprovider</name> > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value> > </property> > <property> > <name>hbase.crypto.keyprovider.parameters</name> > <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</value> > </property> > <property> > <name>hbase.crypto.master.key.name</name> > <value>hdfs</value> > </property> > <property> > <name>hfile.format.version</name> > <value>3</value> > </property> > <property> > <name>hbase.regionserver.hlog.reader.impl</name> > > <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value> > </property> > <property> > <name>hbase.regionserver.hlog.writer.impl</name> > > <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value> > </property> > <property> > <name>hbase.regionserver.wal.encryption</name> > <value>true</value> > </property> > 7. Start the Region Server, and perform the 'table4-0' query > hbase(main):003:0> count 'table4-0' > ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region > table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not online on > XX-XX-XX-XX,60020,1406209023146 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2685) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4119) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3066) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:168) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:39) > at > org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:111) > at java.lang.Thread.run(Thread.java:662) > 8. Run the hbase hbck to repair, as below > ./hbase hbck -details > ......................... > Summary: > table1-0 is okay. > Number of regions: 0 > Deployed on: > table2-0 is okay. > Number of regions: 0 > Deployed on: > table3-0 is okay. > Number of regions: 0 > Deployed on: > table4-0 is okay. > Number of regions: 0 > Deployed on: > table5-0 is okay. > Number of regions: 0 > Deployed on: > table6-0 is okay. > Number of regions: 0 > Deployed on: > table7-0 is okay. > Number of regions: 0 > Deployed on: > table8-0 is okay. > Number of regions: 0 > Deployed on: > table9-0 is okay. > Number of regions: 0 > Deployed on: > hbase:meta is okay. > Number of regions: 1 > Deployed on: XX-XX-XX-XX,60020,1406209023146 > hbase:acl is okay. > Number of regions: 0 > Deployed on: > hbase:namespace is okay. > Number of regions: 0 > Deployed on: > 22 inconsistencies detected. > Status: INCONSISTENT > 2014-07-24 19:13:05,532 INFO [main] > client.HConnectionManager$HConnectionImplementation: Closing master protocol: > MasterService > 2014-07-24 19:13:05,533 INFO [main] > client.HConnectionManager$HConnectionImplementation: Closing zookeeper > sessionid=0x1475d1611611bcf > 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing session: > 0x1475d1611611bcf > 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing client for > session: 0x1475d1611611bcf > 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] > zookeeper.ClientCnxn: Reading reply sessionid:0x1475d1611611bcf, packet:: > clientPath:null serverPath:null finished:false header:: 6,-11 replyHeader:: > 6,4295102074,0 request:: null response:: null > 2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn: Disconnecting > client for session: 0x1475d1611611bcf > 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] > zookeeper.ClientCnxn: An exception was thrown while closing send thread for > session 0x1475d1611611bcf : Unable to read additional data from server > sessionid 0x1475d1611611bcf, likely server has closed socket > 2014-07-24 19:13:05,546 INFO [main-EventThread] zookeeper.ClientCnxn: > EventThread shut down > 2014-07-24 19:13:05,546 INFO [main] zookeeper.ZooKeeper: Session: > 0x1475d1611611bcf closed > shankar1@XX-XX-XX-XX:~/DataSight/hbase/bin> > 9. Fix the assignments as below > ./hbase hbck -fixAssignments > Summary: > table1-0 is okay. > Number of regions: 1 > Deployed on: XX-XX-XX-XX,60020,1406209023146 > table2-0 is okay. > Number of regions: 1 > Deployed on: XX-XX-XX-XX,60020,1406209023146 > table3-0 is okay. > Number of regions: 1 > Deployed on: XX-XX-XX-XX,60020,1406209023146 > table4-0 is okay. > Number of regions: 1 > Deployed on: XX-XX-XX-XX,60020,1406209023146 > table5-0 is okay. > Number of regions: 1 > Deployed on: XX-XX-XX-XX,60020,1406209023146 > table6-0 is okay. > Number of regions: 1 > Deployed on: XX-XX-XX-XX,60020,1406209023146 > table7-0 is okay. > Number of regions: 1 > Deployed on: XX-XX-XX-XX,60020,1406209023146 > table8-0 is okay. > Number of regions: 1 > Deployed on: XX-XX-XX-XX,60020,1406209023146 > table9-0 is okay. > Number of regions: 1 > Deployed on: XX-XX-XX-XX,60020,1406209023146 > hbase:meta is okay. > Number of regions: 1 > Deployed on: XX-XX-XX-XX,60020,1406209023146 > hbase:acl is okay. > Number of regions: 1 > Deployed on: XX-XX-XX-XX,60020,1406209023146 > hbase:namespace is okay. > Number of regions: 1 > Deployed on: XX-XX-XX-XX,60020,1406209023146 > 0 inconsistencies detected. > Status: OK > 2014-07-24 19:44:55,194 INFO [main] > client.HConnectionManager$HConnectionImplementation: Closing master protocol: > MasterService > 2014-07-24 19:44:55,194 INFO [main] > client.HConnectionManager$HConnectionImplementation: Closing zookeeper > sessionid=0x2475d15f7b31b73 > 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing session: > 0x2475d15f7b31b73 > 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing client for > session: 0x2475d15f7b31b73 > 2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] > zookeeper.ClientCnxn: Reading reply sessionid:0x2475d15f7b31b73, packet:: > clientPath:null serverPath:null finished:false header:: 7,-11 replyHeader:: > 7,4295102377,0 request:: null response:: null > 2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn: Disconnecting > client for session: 0x2475d15f7b31b73 > 2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] > zookeeper.ClientCnxn: An exception was thrown while closing send thread for > session 0x2475d15f7b31b73 : Unable to read additional data from server > sessionid 0x2475d15f7b31b73, likely server has closed socket > 2014-07-24 19:44:55,204 INFO [main] zookeeper.ZooKeeper: Session: > 0x2475d15f7b31b73 closed > 2014-07-24 19:44:55,204 INFO [main-EventThread] zookeeper.ClientCnxn: > EventThread shut down > 10. Fix the assignments as below > ./hbase hbck -fixAssignments -fixMeta > Summary: > table1-0 is okay. > Number of regions: 1 > Deployed on: XX-XX-XX-XX,60020,1406209023146 > table2-0 is okay. > Number of regions: 1 > Deployed on: XX-XX-XX-XX,60020,1406209023146 > table3-0 is okay. > Number of regions: 1 > Deployed on: XX-XX-XX-XX,60020,1406209023146 > table4-0 is okay. > Number of regions: 1 > Deployed on: XX-XX-XX-XX,60020,1406209023146 > table5-0 is okay. > Number of regions: 1 > Deployed on: XX-XX-XX-XX,60020,1406209023146 > table6-0 is okay. > Number of regions: 1 > Deployed on: XX-XX-XX-XX,60020,1406209023146 > table7-0 is okay. > Number of regions: 1 > Deployed on: XX-XX-XX-XX,60020,1406209023146 > table8-0 is okay. > Number of regions: 1 > Deployed on: XX-XX-XX-XX,60020,1406209023146 > table9-0 is okay. > Number of regions: 1 > Deployed on: XX-XX-XX-XX,60020,1406209023146 > hbase:meta is okay. > Number of regions: 1 > Deployed on: XX-XX-XX-XX,60020,1406209023146 > hbase:acl is okay. > Number of regions: 1 > Deployed on: XX-XX-XX-XX,60020,1406209023146 > hbase:namespace is okay. > Number of regions: 1 > Deployed on: XX-XX-XX-XX,60020,1406209023146 > 0 inconsistencies detected. > Status: OK > 2014-07-24 19:46:16,290 INFO [main] > client.HConnectionManager$HConnectionImplementation: Closing master protocol: > MasterService > 2014-07-24 19:46:16,290 INFO [main] > client.HConnectionManager$HConnectionImplementation: Closing zookeeper > sessionid=0x3475d1605321be9 > 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing session: > 0x3475d1605321be9 > 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing client for > session: 0x3475d1605321be9 > 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] > zookeeper.ClientCnxn: Reading reply sessionid:0x3475d1605321be9, packet:: > clientPath:null serverPath:null finished:false header:: 6,-11 replyHeader:: > 6,4295102397,0 request:: null response:: null > 2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn: Disconnecting > client for session: 0x3475d1605321be9 > 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] > zookeeper.ClientCnxn: An exception was thrown while closing send thread for > session 0x3475d1605321be9 : Unable to read additional data from server > sessionid 0x3475d1605321be9, likely server has closed socket > 2014-07-24 19:46:16,300 INFO [main] zookeeper.ZooKeeper: Session: > 0x3475d1605321be9 closed > 2014-07-24 19:46:16,300 INFO [main-EventThread] zookeeper.ClientCnxn: > EventThread shut down > hbase(main):006:0> count 'table4-0' > 0 row(s) in 0.0200 seconds > => 0 > hbase(main):007:0> > Complete data loss happened, > WALs, oldWALs & /hbase/data/default/table4-0/ does not have any data -- This message was sent by Atlassian JIRA (v6.2#6252)