Hi Jean, This is the scenario i am talking about... Let me know if every thing is ok with this region chain....
Regionname StartKey Endkey RAW,GpsgQ,1393477054705.defb006868d8191e76a2ae7e9d203419. GpsgQ G7stL RAW,G7stL,1393477054705.0d123f246312f937e930fc76fc8d4b9c. G7stL HCuv RAW,HCuv,1393490697926.d1ea022c6fd534aaf89139ca0726cce5. HCuv HCuv,1377721814384.87a47003ddb6f0e18a1b735c89bf8ac3.,1378197588060.bcae64eb2788208ab723eb2ad4f5925f. RAW,HCuv,1377721814384.87a47003ddb6f0e18a1b735c89bf8ac3.,1378197588060.bcae64eb2788208ab723eb2ad4f5925f.,1393484971849.a71d77c695705d07a121340c07a1cc0c. HCuv,1377721814384.87a47003ddb6f0e18a1b735c89bf8ac3.,1378197588060.bcae64eb2788208ab723eb2ad4f5925f. eE08 The region hashes size in HDFS are "d1ea022c6fd534aaf89139ca0726cce5 d1ea022c6fd534aaf89139ca0726cce5" 600KB Also Initially we though it was human error that some one might have deleted hdfs dirs under some regions. But, surprisingy only some columns in a column family for a row in the region are lost. If some one has deleted entire dir, then entire column family for those region rows should be lost, since hbase has stores files for each column family. We also have rows with millions of columns in the table and some of them are present and some of them are lost.... and it happened in a some regions and not across all the table regions. All other tables are good in the cluster. On Thu, Feb 27, 2014 at 9:40 PM, Jean-Marc Spaggiari < jean-m...@spaggiari.org> wrote: > Hi Kiran, > > 2 things. > > 1) Is there any reason for you to use a so old HBase version? any chance to > migrate to a more recent one? 0.94.17 is out. > 2) What do you mean by "I have never seen a wierd start key and end key > like this"? I don't see anything wrong with what you described. What you > keys look like? Can you go a get with key beeing "K3.xyz,138798010000.xyp"? > > JM > > > 2014-02-27 10:55 GMT-05:00 kiran <kiran.sarvabho...@gmail.com>: > > > Adding to that there are many regions with 0MB size and have CF's as > > specified in the table... > > > > > > On Thu, Feb 27, 2014 at 9:23 PM, kiran <kiran.sarvabho...@gmail.com> > > wrote: > > > > > Hi All, > > > > > > We have been experiencing severe data loss issues from few hours. There > > > are some wierd things going on in the cluster. We were unable to locate > > the > > > data even in hdfs > > > > > > Hbase version 0.94.1 > > > > > > Here is the wierd things that are going on: > > > > > > 1) Table which was once 1TB has now become 170GB with many of the > regions > > > which we once 7gb are now becoming few MB's. We are no clue what is > > > happening at all > > > > > > 2) Table is splitting (or what ever) (100 regions have become 200 > > regions) > > > and ours is constantregionsplitpolicy with region size 20gb. I don't > know > > > why it is even spltting > > > > > > 3) HDFS namenode dump size which we periodically backup is decreasing > > > > > > 4) And there is a region chain with start keys and end keys as, I can't > > > copy paste the exact thing. For example > > > > > > K1.xxx K2.xyz > > > K2.xyz K3.xyz,138798010000.xyp > > > K3.xyz,138798010000.xyp K4.xyq > > > > > > I have never seen a wierd start key and end key like this. We also > > suspect > > > a failed split of a region around 20GB. We looked at logs many times > but > > > unable to get any sense out of it. Please help us out and we can't > afford > > > data loss. > > > > > > Yesterday, There was an cluster crash of root region but we thought we > > > sucessfully restored that.But things did n't go that way.... There was > a > > > consitent data loss after that. > > > > > > > > > -- > > > Thank you > > > Kiran Sarvabhotla > > > > > > -----Even a correct decision is wrong when it is taken late > > > > > > > > > > > > -- > > Thank you > > Kiran Sarvabhotla > > > > -----Even a correct decision is wrong when it is taken late > > > -- Thank you Kiran Sarvabhotla -----Even a correct decision is wrong when it is taken late