Yes in btw the restart the config was changed. In steps the #4 was that.
 Wal encryption config is changed to false. Well that is ok but the reader
can not be changed. Because we dont find reader by looking at wal file meta
that whether this file is encrypted or not. Wal reading was this way with
user has to configure correct reader. So not sure whether any code change
needed or not.  Once the wal encryption was done, even after changing it
back to off the reader should continue to be SecureProtobufLogReader. (At
least till all existing wals are replayed)

And files moved to old logs but not corrupt folder is something tobe
checked. Any chance for a look there and patch Shankar?

Anoop


Anoop




On Sunday, July 27, 2014, Andrew Purtell <andrew.purt...@gmail.com> wrote:
> So the regionserver configuration was changed after it crashed but before
it was restarted ?
>
> The impression given by the initial report is that simply using encrypted
WALs will cause data loss. That's not the case as I have confirmed. There
could be an edge case somewhere but the original reporter has left out
important detail about how to reproduce the problem. The below is not
written in clear language either, so I'm not following along. I'd be happy
to help look at this more once clear steps for reproducing the problem are
available. Otherwise since you're talking with Shankar somehow offline
already I'll leave you to it Anoop.
>
>> Also when the file can not be read, this is not moved under corrupt logs
is a concerning thing.  Need to look at that.
>
> Agreed.
>
>
>> On Jul 27, 2014, at 1:07 AM, Anoop John <anoop.hb...@gmail.com> wrote:
>>
>> As per Shankar he can get things work with below configs
>>
>> <property>
>>        <name>hbase.regionserver.hlog.reader.impl</name>
>>
>>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
>> </property>
>> <property>
>>        <name>hbase.regionserver.hlog.writer.impl</name>
>>
>>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
>> </property>
>> <property>
>>        <name>hbase.regionserver.wal.encryption</name>
>>        <value>false</value>
>> </property>
>>
>> Once the RS crash happened,  the config is maintained above way. See that
>> WAL encryption is disabled now.  Still note that the reader is
>> SecureProtobufLogReader. The existing WAL files are with encryption and
>> only SecureProtobufLogReader can read them.  So if that is not
configured,
>> the default reader is. ProtobufLogReader  can not read them back
>> correctly.    So this is the issue that Shankar faced.
>>
>> Also when the file can not be read, this is not moved under corrupt logs
is
>> a concerning thing.  Need to look at that.
>>
>> -Anoop-
>>
>> On Sat, Jul 26, 2014 at 11:17 PM, Andrew Purtell <
andrew.purt...@gmail.com>
>> wrote:
>>
>>> My attempt to reproduce this issue:
>>>
>>> 1. Set up Hadoop 2.4.1 namenode, secondarynamenode, and datanode on a
dev
>>> box.
>>>
>>> 2. Set up HBase 0.98.5-SNAPSHOT hosted zk, master, and regionserver
also on
>>> this dev box.
>>>
>>> 3. Set dfs.replication and
hbase.regionserver.hlog.tolerable.lowreplication
>>> to 1. Set up a keystore and enabled WAL encryption.
>>>
>>> 4. Created a test table.
>>>
>>> 5. Used YCSB to write 1000 rows to the test table. No flushes observed.
>>>
>>> 6. Used the shell to count the number of records in the test table.
Count =
>>> 1000 rows
>>>
>>> 7. kill -9 the regionserver process.
>>>
>>> 8. Started a new regionserver process. Observed log splitting and
replay in
>>> the regionserver log, no errors.
>>>
>>> 9. Used the shell to count the number of records in the test table.
Count =
>>> 1000 rows
>>>
>>> Tried this a few times.
>>>
>>> Shankar, can you try running through the above and let us know if the
>>> outcome is different?
>>>
>>>
>>>
>>> On Sat, Jul 26, 2014 at 8:54 AM, Andrew Purtell <
andrew.purt...@gmail.com>
>>> wrote:
>>>
>>>> Thanks for the detail. So to summarize:
>>>>
>>>> 0. HBase 0.98.3 and HDFS 2.4.1
>>>>
>>>> 1. All data before failure has not yet been flushed so only exists in
the
>>>> WAL files.
>>>>
>>>> 2. During distributed splitting, the WAL has either not been written
out
>>>> or is unreadable:
>>>>
>>>>
>>>> 2014-07-26 19:29:16,160 ERROR [RS_LOG_REPLAY_OPS-host1:60020-0]
>>>> codec.BaseDecoder: Partial cell read caused by EOF:
java.io.IOException:
>>>> Premature EOF from inputStream
>>>>
>>>>
>>>> 3. This file is still moved to oldWALs even though splitting failed.
>>>>
>>>> 4. Setting 'hbase.regionserver.wal.encryption' to false allows for data
>>>> recovery in your scenario.
>>>>
>>>> See https://issues.apache.org/jira/browse/HBASE-11595
>>>>
>>>>
>>>>
>>>>
>>>> On Jul 26, 2014, at 6:50 AM, Shankar hiremath <
>>> shankar.hirem...@huawei.com>
>>>> wrote:
>>>>
>>>>
>>>> Hi Andrew,
>>>>
>>>>
>>>> Please find the details
>>>>
>>>>
>>>> Hbase 0.98.3 & hadoop 2.4.1
>>>>
>>>> Hbase root file system on hdfs
>>>>
>>>>
>>>> On Hmaster side there is no failure or error message in the log file
>>>>
>>>> On Region Server side the the below error message reported as below
>>>>
>>>>
>>>> Region Server Log:
>>>>
>>>> 2014-07-26 19:29:15,904 DEBUG
[regionserver60020-SendThread(host2:2181)]
>>>> zookeeper.ClientCnxn: Reading reply sessionid:0x1476d8c83e5012c,
packet::
>>>> clientPath:null serverPath:null finished:false header:: 172,4
>>>> replyHeader:: 172,4294988825,0  request:: '/hbase/table/hbase:acl,F
>>>> response::
>>>
#ffffffff000146d61737465723a36303030303372ffffffeb39ffffffbbf15ffffffc15042554680,s{4294967476,4294967480,1406293600844,1406293601414,2,0,0,0,31,0,4294967476}
>>>>
>>>> 2014-07-26 19:29:15,905 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-0]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-0,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,905 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-1]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-1,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,905 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-2]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-2,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,906 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-3]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-3,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,906 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-4]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-4,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,906 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-5]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-5,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,906 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-6]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-6,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,906 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-7]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-7,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,906 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-8]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-8,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,907 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-9]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-9,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,907 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-10]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-10,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,907 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-11]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-11,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,907 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-12]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-12,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,907 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-13]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-13,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,907 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-14]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-14,5,main]: starting
>>>>
>>>> 2014-07-26 19:29:15,907 DEBUG
[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-15]
>>>> wal.HLogSplitter: Writer thread
>>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-15,5,main]: starting
>>>>
>>>>
>>>> 2014-07-26 19:29:16,160 ERROR [RS_LOG_REPLAY_OPS-host1:60020-0]
>>>> codec.BaseDecoder: Partial cell read caused by EOF:
java.io.IOException:
>>>> Premature EOF from inputStream
>>>>
>>>>
>>>> 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
>>>> wal.HLogSplitter: Finishing writing output logs and closing down.
>>>>
>>>> 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
>>>> wal.HLogSplitter: Waiting for split writer threads to finish
>>>>
>>>> 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
>>>> wal.HLogSplitter: Split writers finished
>>>>
>>>> 2014-07-26 19:29:16,162 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
>>>> wal.HLogSplitter: Processed 0 edits across 0 regions; log
>>>
file=hdfs://hacluster/hbase/WALs/host1,60020,1406383007151-splitting/host1%2C60020%2C1406383007151.1406383069334.meta
>>>> is corrupted = false progress failed = false
>>>>
>>>> 2014-07-26 19:29:16,184 DEBUG
[regionserver60020-SendThread(host2:2181)]
>>>> zookeeper.ClientCnxn: Got notification sessionid:0x1476d8c83e5012c
>>>>
>>>>
>>>>
>>>> When I query the table data, which was in WAL files(before the
>>>> RegionServer machine went down) is not coming,
>>>>
>>>> One more thing what I observed is even when the WAL file not
successfully
>>>> processed then also it is moving to /oldWALs folder.
>>>>
>>>> So when I revert back the below 3 configuration in Region Server side
and
>>>> restart, since the WAL is already moved to oldWALS/ folder,
>>>>
>>>> So it will not get processed.
>>>>
>>>>
>>>> <property>
>>>>
>>>>   <name>hbase.regionserver.hlog.reader.impl</name>
>>>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>>  <name>hbase.regionserver.hlog.writer.impl</name>
>>>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>>  <name>hbase.regionserver.wal.encryption</name>
>>>>
>>>>  <value>true</value>
>>>>
>>>> </property>
>>>
-------------------------------------------------------------------------------------------------------------
>>>>
>>>>
>>>> And one more scenario I tried (Anoop suggested), with the below
>>>> configuration (instead of deleting the below 3 config paramters
>>>>
>>>> Kepp all but make only 'hbase.regionserver.wal.encryption=false') the
>>>> encrypted wal file is getting processed
>>>>
>>>> Successfully, and the query table is giving the WAL data (before the
>>>> RegionServer machine went down) correctly.
>>>>
>>>>
>>>> <property>
>>>>
>>>>  <name>hbase.regionserver.hlog.reader.impl</name>
>>>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>>  <name>hbase.regionserver.hlog.writer.impl</name>
>>>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>>  <name>hbase.regionserver.wal.encryption</name>
>>>>
>>>>  <value>false</value>
>>>>
>>>> </property>
>>>>
>>>>
>>>>
>>>> Regards
>>>>
>>>> -Shankar
>>>>
>>>>
>>>> This e-mail and its attachments contain confidential information from
>>>> HUAWEI, which is intended only for the person or entity whose address
is
>>>> listed above. Any use of the information contained herein in any way
>>>> (including, but not limited to, total or partial disclosure,
>>> reproduction,
>>>> or dissemination) by persons other than the intended recipient(s) is
>>>> prohibited. If you receive this e-mail in error, please notify the
sender
>>>> by phone or email immediately and delete it!
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>>
>>>> From: andrew.purt...@gmail.com [mailto:andrew.purt...@gmail.com
>>>> <andrew.purt...@gmail.com>] On Behalf Of Andrew Purtell
>>>>
>>>> Sent: 26 July 2014 AM 02:21
>>>>
>>>> To: user@hbase.apache.org
>>>>
>>>> Subject: Re: HBase file encryption, inconsistencies observed and data
>>> loss
>>>>
>>>>
>>>> Encryption (or the lack of it) doesn't explain missing HFiles.
>>>>
>>>>
>>>> Most likely if you are having a problem with encryption, this will
>>>> manifest as follows: HFiles will be present. However, you will find
many
>>>> IOExceptions in the regionserver logs as they attempt to open the
HFiles
>>>> but fail because the data is unreadable.
>>>>
>>>>
>>>> We should start by looking at more basic issues. What could explain the
>>>> total disappearance of HFiles.
>>>>
>>>>
>>>> Is the HBase root filesystem on HDFS (fs URL starts with hdfs://) or on
>>>> the local filesystem (fs URL starts with file://)?
>>>>
>>>>
>>>> In your email you provide only exceptions printed by the client. What
>>> kind
>>>> of exceptions appear in the regionserver logs? Or appear in the master
>>> log?
>>>>
>>>> If the logs are large your best bet is to pastebin them and then send
the
>>>> URL to the paste in your response.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jul 25, 2014 at 7:08 AM, Shankar hiremath <
>>>> shankar.hirem...@huawei.com> wrote:
>>>>
>>>>
>>>> HBase file encryption some inconsistencies observed and data loss
>>>>
>>>> happens after running the hbck tool,
>>>>
>>>> the operation steps are as below.    (one thing what I observed is, on
>>>>
>>>> startup of HMaster if it is not able to process the WAL file, then
>>>>
>>>> also it moved to /oldWALs)
>>>>
>>>>
>>>> Procedure:
>>>>
>>>> 1. Start the Hbase services (HMaster & region Server) 2. Enable HFile
>>>>
>>>> encryption and WAL file encryption as below, and perform 'table4-0'
>>>>
>>>> put operations (100 records added) <property>
>>>>
>>>> <name>hbase.crypto.keyprovider</name>
>>>>
>>>> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hbase.crypto.keyprovider.parameters</name>
>>>>
>>>> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
>>>>
>>>> </value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hbase.crypto.master.key.name</name>
>>>>
>>>> <value>hdfs</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hfile.format.version</name>
>>>>
>>>> <value>3</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hbase.regionserver.hlog.reader.impl</name>
>>>>
>>>>
>>>> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReade
>>>>
>>>> r</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hbase.regionserver.hlog.writer.impl</name>
>>>>
>>>>
>>>> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWrite
>>>>
>>>> r</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hbase.regionserver.wal.encryption</name>
>>>>
>>>> <value>true</value>
>>>>
>>>> </property>
>>>>
>>>> 3. Machine went down, so all process went down
>>>>
>>>>
>>>> 4. We disabled the WAL file encryption for performance reason, and
>>>>
>>>> keep encryption only for Hfile, as below <property>
>>>>
>>>> <name>hbase.crypto.keyprovider</name>
>>>>
>>>> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hbase.crypto.keyprovider.parameters</name>
>>>>
>>>> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
>>>>
>>>> </value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hbase.crypto.master.key.name</name>
>>>>
>>>> <value>hdfs</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hfile.format.version</name>
>>>>
>>>> <value>3</value>
>>>>
>>>> </property>
>>>>
>>>> 5. Start the Region Server and query the 'table4-0' data
>>>>
>>>> hbase(main):003:0> count 'table4-0'
>>>>
>>>> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
>>>>
>>>> table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not
>>>>
>>>> online on
>>>>
>>>> XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedN
>>>>
>>>> ame(HRegionServer.java:2685)
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSe
>>>>
>>>> rver.java:4119)
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.
>>>>
>>>> java:3066)
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
>>>>
>>>> 2.callBlockingMethod(ClientProtos.java:29497)
>>>>
>>>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
>>>>
>>>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcS
>>>>
>>>> cheduler.java:168)
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSch
>>>>
>>>> eduler.java:39)
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSchedule
>>>>
>>>> r.java:111) at java.lang.Thread.run(Thread.java:662)
>>>>
>>>> 6. Not able to read the data, so we decided to revert back the
>>>>
>>>> configuration (as original) 7. Kill/Stop the Region Server, revert all
>>>>
>>>> the configurations as original, as below <property>
>>>>
>>>> <name>hbase.crypto.keyprovider</name>
>>>>
>>>> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hbase.crypto.keyprovider.parameters</name>
>>>>
>>>> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
>>>>
>>>> </value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hbase.crypto.master.key.name</name>
>>>>
>>>> <value>hdfs</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hfile.format.version</name>
>>>>
>>>> <value>3</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hbase.regionserver.hlog.reader.impl</name>
>>>>
>>>>
>>>> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReade
>>>>
>>>> r</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hbase.regionserver.hlog.writer.impl</name>
>>>>
>>>>
>>>> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWrite
>>>>
>>>> r</value>
>>>>
>>>> </property>
>>>>
>>>> <property>
>>>>
>>>> <name>hbase.regionserver.wal.encryption</name>
>>>>
>>>> <value>true</value>
>>>>
>>>> </property>
>>>>
>>>> 7. Start the Region Server, and perform the 'table4-0' query
>>>>
>>>> hbase(main):003:0> count 'table4-0'
>>>>
>>>> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
>>>>
>>>> table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not
>>>>
>>>> online on
>>>>
>>>> XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedN
>>>>
>>>> ame(HRegionServer.java:2685)
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSe
>>>>
>>>> rver.java:4119)
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.
>>>>
>>>> java:3066)
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
>>>>
>>>> 2.callBlockingMethod(ClientProtos.java:29497)
>>>>
>>>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
>>>>
>>>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcS
>>>>
>>>> cheduler.java:168)
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSch
>>>>
>>>> eduler.java:39)
>>>>
>>>> at
>>>>
>>>> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSchedule
>>>>
>>>> r.java:111) at java.lang.Thread.run(Thread.java:662)
>>>>
>>>> 8. Run the hbase hbck to repair, as below ./hbase hbck -details
>>>>
>>>> .........................
>>>>
>>>> Summary:
>>>>
>>>> table1-0 is okay.
>>>>
>>>> Number of regions: 0
>>>>
>>>> Deployed on:
>>>>
>>>> table2-0 is okay.
>>>>
>>>> Number of regions: 0
>>>>
>>>> Deployed on:
>>>>
>>>> table3-0 is okay.
>>>>
>>>> Number of regions: 0
>>>>
>>>> Deployed on:
>>>>
>>>> table4-0 is okay.
>>>>
>>>> Number of regions: 0
>>>>
>>>> Deployed on:
>>>>
>>>> table5-0 is okay.
>>>>
>>>> Number of regions: 0
>>>>
>>>> Deployed on:
>>>>
>>>> table6-0 is okay.
>>>>
>>>> Number of regions: 0
>>>>
>>>> Deployed on:
>>>>
>>>> table7-0 is okay.
>>>>
>>>> Number of regions: 0
>>>>
>>>> Deployed on:
>>>>
>>>> table8-0 is okay.
>>>>
>>>> Number of regions: 0
>>>>
>>>> Deployed on:
>>>>
>>>> table9-0 is okay.
>>>>
>>>> Number of regions: 0
>>>>
>>>> Deployed on:
>>>>
>>>> hbase:meta is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>>>>
>>>> Number of regions: 0
>>>>
>>>> Deployed on:
>>>>
>>>> hbase:namespace is okay.
>>>>
>>>> Number of regions: 0
>>>>
>>>> Deployed on:
>>>>
>>>> 22 inconsistencies detected.
>>>>
>>>> Status: INCONSISTENT
>>>>
>>>> 2014-07-24 19:13:05,532 INFO [main]
>>>>
>>>> client.HConnectionManager$HConnectionImplementation: Closing master
>>>>
>>>> protocol: MasterService
>>>>
>>>> 2014-07-24 19:13:05,533 INFO [main]
>>>>
>>>> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
>>>>
>>>> sessionid=0x1475d1611611bcf
>>>>
>>>> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing
>>> session:
>>>>
>>>> 0x1475d1611611bcf
>>>>
>>>> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing
>>>>
>>>> client for session: 0x1475d1611611bcf
>>>>
>>>> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>>>>
>>>> zookeeper.ClientCnxn: Reading reply sessionid:0x1475d1611611bcf,
packet::
>>>>
>>>> clientPath:null serverPath:null finished:false header:: 6,-11
>>> replyHeader::
>>>>
>>>> 6,4295102074,0 request:: null response:: null
>>>>
>>>> 2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn:
>>>>
>>>> Disconnecting client for session: 0x1475d1611611bcf
>>>>
>>>> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>>>>
>>>> zookeeper.ClientCnxn: An exception was thrown while closing send
>>>>
>>>> thread for session 0x1475d1611611bcf : Unable to read additional data
>>>>
>>>> from server sessionid 0x1475d1611611bcf, likely server has closed
>>>>
>>>> socket
>>>>
>>>> 2014-07-24 19:13:05,546 INFO [main-EventThread] zookeeper.ClientCnxn:
>>>>
>>>> EventThread shut down
>>>>
>>>> 2014-07-24 19:13:05,546 INFO [main] zookeeper.ZooKeeper: Session:
>>>>
>>>> 0x1475d1611611bcf closed
>>>>
>>>> shankar1@XX-XX-XX-XX:~/DataSight/hbase/bin>
>>>>
>>>> 9. Fix the assignments as below
>>>>
>>>> ./hbase hbck -fixAssignments
>>>>
>>>> Summary:
>>>>
>>>> table1-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table2-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table3-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table4-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table5-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table6-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table7-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table8-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table9-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> 0 inconsistencies detected.
>>>>
>>>> Status: OK
>>>>
>>>> 2014-07-24 19:44:55,194 INFO [main]
>>>>
>>>> client.HConnectionManager$HConnectionImplementation: Closing master
>>>>
>>>> protocol: MasterService
>>>>
>>>> 2014-07-24 19:44:55,194 INFO [main]
>>>>
>>>> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
>>>>
>>>> sessionid=0x2475d15f7b31b73
>>>>
>>>> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing
>>> session:
>>>>
>>>> 0x2475d15f7b31b73
>>>>
>>>> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing
>>>>
>>>> client for session: 0x2475d15f7b31b73
>>>>
>>>> 2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>>>>
>>>> zookeeper.ClientCnxn: Reading reply sessionid:0x2475d15f7b31b73,
packet::
>>>>
>>>> clientPath:null serverPath:null finished:false header:: 7,-11
>>> replyHeader::
>>>>
>>>> 7,4295102377,0 request:: null response:: null
>>>>
>>>> 2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn:
>>>>
>>>> Disconnecting client for session: 0x2475d15f7b31b73
>>>>
>>>> 2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>>>>
>>>> zookeeper.ClientCnxn: An exception was thrown while closing send
>>>>
>>>> thread for session 0x2475d15f7b31b73 : Unable to read additional data
>>>>
>>>> from server sessionid 0x2475d15f7b31b73, likely server has closed
>>>>
>>>> socket
>>>>
>>>> 2014-07-24 19:44:55,204 INFO [main] zookeeper.ZooKeeper: Session:
>>>>
>>>> 0x2475d15f7b31b73 closed
>>>>
>>>> 2014-07-24 19:44:55,204 INFO [main-EventThread] zookeeper.ClientCnxn:
>>>>
>>>> EventThread shut down
>>>>
>>>> 10. Fix the assignments as below
>>>>
>>>> ./hbase hbck -fixAssignments -fixMeta
>>>>
>>>> Summary:
>>>>
>>>> table1-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table2-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table3-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table4-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table5-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table6-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table7-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table8-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> table9-0 is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
>>>>
>>>> Number of regions: 1
>>>>
>>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
>>>>
>>>> 0 inconsistencies detected.
>>>>
>>>> Status: OK
>>>>
>>>> 2014-07-24 19:46:16,290 INFO [main]
>>>>
>>>> client.HConnectionManager$HConnectionImplementation: Closing master
>>>>
>>>> protocol: MasterService
>>>>
>>>> 2014-07-24 19:46:16,290 INFO [main]
>>>>
>>>> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
>>>>
>>>> sessionid=0x3475d1605321be9
>>>>
>>>> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing
>>> session:
>>>>
>>>> 0x3475d1605321be9
>>>>
>>>> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing
>>>>
>>>> client for session: 0x3475d1605321be9
>>>>
>>>> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>>>>
>>>> zookeeper.ClientCnxn: Reading reply sessionid:0x3475d1605321be9,
packet::
>>>>
>>>> clientPath:null serverPath:null finished:false header:: 6,-11
>>> replyHeader::
>>>>
>>>> 6,4295102397,0 request:: null response:: null
>>>>
>>>> 2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn:
>>>>
>>>> Disconnecting client for session: 0x3475d1605321be9
>>>>
>>>> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>>>>
>>>> zookeeper.ClientCnxn: An exception was thrown while closing send
>>>>
>>>> thread for session 0x3475d1605321be9 : Unable to read additional data
>>>>
>>>> from server sessionid 0x3475d1605321be9, likely server has closed
>>>>
>>>> socket
>>>>
>>>> 2014-07-24 19:46:16,300 INFO [main] zookeeper.ZooKeeper: Session:
>>>>
>>>> 0x3475d1605321be9 closed
>>>>
>>>> 2014-07-24 19:46:16,300 INFO [main-EventThread] zookeeper.ClientCnxn:
>>>>
>>>> EventThread shut down
>>>>
>>>> hbase(main):006:0> count 'table4-0'
>>>>
>>>> 0 row(s) in 0.0200 seconds
>>>>
>>>> => 0
>>>>
>>>> hbase(main):007:0>
>>>>
>>>> Complete data loss happened,
>>>>
>>>> WALs, oldWALs & /hbase/data/default/table4-0/ does not have any data
>>>>
>>>>
>>>>
>>>>
>>>> [X]
>>>>
>>>> This e-mail and its attachments contain confidential information from
>>>>
>>>> HUAWEI, which is intended only for the person or entity whose address
>>>>
>>>> is listed above. Any use of the information contained herein in any
>>>>
>>>> way (including, but not limited to, total or partial disclosure,
>>>>
>>>> reproduction, or dissemination) by persons other than the intended
>>>>
>>>> recipient(s) is prohibited. If you receive this e-mail in error,
>>>>
>>>> please notify the sender by phone or email immediately and delete it!
>>>>
>>>> [X]
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Best regards,
>>>>
>>>>
>>>> - Andy
>>>>
>>>>
>>>> Problems worthy of attack prove their worth by hitting back. - Piet
Hein
>>>> (via Tom White)
>>>
>

Reply via email to