The ways that you can lose data in my point of views:

1. some tuples share the same row-key+cf+column. Hence, when you load your
data in HBase, they will be loaded into the same column and may exceed the
predefined max version.

2. As Ted mentioned, you may import some delete, do you generate tombstones
in your bulk load?

By the way, can you show us the schema of your imported data, like whether
it contains duplicates,  how is your row key design?

regards!

Yong


On Wed, Jul 24, 2013 at 3:55 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> Which HBase release are you using ?
>
> Was it possible that the import included Delete's ?
>
> Cheers
>
> On Tue, Jul 23, 2013 at 5:23 PM, Huangmao (Homer) Quan <luj...@gmail.com
> >wrote:
>
> > Hi hbase users,
> >
> > We got an issue when import data from thrift (perl)
> >
> > We found the number of data is less than expected.
> >
> > when scan the table, we got:
> > ERROR: java.lang.RuntimeException:
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> > attempts=7, exceptions:
> > Tue Jul 23 23:01:41 UTC 2013,
> > org.apache.hadoop.hbase.client.ScannerCallable@180f9720,
> > java.io.IOException: java.io.IOException: Could not iterate
> > StoreFileScanner[HFileScanner for reader
> >
> >
> reader=file:/tmp/hbase-hbase/hbase/skg/d13644aae91d7ee9a8fdde461e8ec217/wrapstar/51a2e5871b7a4af8a2d9d17ed0c14031,
> > compression=none, cacheConf=CacheConfig:enabled [cacheDataOnRead=false]
> > [cacheDataOnWrite=false] [cacheIndexesOnWrite=false]
> > [cacheBloomsOnWrite=false] [cacheEvictOnClose=false]
> > [cacheCompressed=false], firstKey="Laughing"Larry
> > Berger-nm5619461/wrapstar:data/1374615644669/Put, lastKey=Jordan-Patrick
> > Marcantonio-nm0545093/wrapstar:data/1374616499993/Put, avgKeyLen=47,
> > avgValueLen=652, entries=156586, length=111099401, cur=George
> > McGovern-nm0569566/wrapstar:data/1374616538067/Put/vlen=17162/ts=0]
> >
> >
> > And even weird, when I monitoring the row number during import, I found
> > some time the row number decrease sharply (lots of data missing)
> >
> > hbase(main):003:0> count 'skgtwo'
> > .............
> > *134453 row(s)* in 7.5510 seconds
> >
> > hbase(main):004:0> count 'skgtwo'
> > ...................
> > *88970 row(s)* in 7.5380 seconds
> >
> > Any suggestion is appreciated.
> >
> > Cheers
> >
> > †Huangmao (Homer) Quan
> > Email:   luj...@gmail.com
> > Google Voice: +1 (530) 903-8125
> > Facebook: http://www.facebook.com/homerquan
> > Linkedin: http://www.linkedin.com/in/homerquan<
> > http://www.linkedin.com/in/earthisflat>
> >
>

Reply via email to