Re: Problems with scan after lot of Puts

Ondřej Stašek Fri, 01 Jun 2012 01:08:02 -0700

Hallo J-D.

I'm currently tied to 0.90.6-cdh3u4. And this 1-row-skip seems to be theresult of some strange RS restart. My test job is running now forseveral hours without error. I'll try to investigate it further and comeup with some result.


Regards

  Ondrej Stasek

On 31.5.2012 19:45, Jean-Daniel Cryans wrote:

There's  concurrent thread on the mailing list that refers to
atomicity issues in 0.90 and issues with scans, may I suggest you run
the test on 0.92.1 or 0.94.0? I did my testing on 0.94 and didn't get
any issues after fixing the scanner.

J-D

On Thu, May 31, 2012 at 3:05 AM, Ondřej Stašek
<ondrej.sta...@firma.seznam.cz>  wrote:

Hallo J-D.

  Thanks for reply. I've modified my code to use scanner copies -
table.getScanner(new Scan(scan)) and run it again. Even after that I got an
error:

12/05/31 10:42:39 INFO hbase.TestPutScan: Run 5 put 1000000 rows
12/05/31 10:44:09 INFO hbase.TestPutScan: Run 5 scan + del every 10th row
12/05/31 10:44:33 ERROR hbase.TestPutScan: Expected value: value 0402040
0000005, got: value 0402041 0000004

It seems that 1 row was skipped during scan. Strange.

I'll keep testing.

  Ondrej Stasek


On 30.5.2012 21:05, Jean-Daniel Cryans wrote:

There you go:

12/05/30 18:54:17 DEBUG client.MetaScanner: Scanning .META. starting
at row=testtable,,00000000000000 for max=10 rows using

org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@f593af
12/05/30 18:54:17 DEBUG
client.HConnectionManager$HConnectionImplementation: Cached location
for
testtable,test_row_0496107,1338404055995.e9c7a4ca97eb2be372445af4d3772031.
is sv4r25s44:62023
12/05/30 18:54:17 DEBUG
client.HConnectionManager$HConnectionImplementation: Removed
testtable,,1338404055995.9389fe5538f19a6f2df27e3958dcb434. for
tableName=testtable from cache because of test_row_0012550
12/05/30 18:54:17 DEBUG
client.HConnectionManager$HConnectionImplementation: Cached location
for testtable,,1338404055995.9389fe5538f19a6f2df27e3958dcb434. is
sv4r25s44:62023
12/05/30 18:57:47 INFO hbase.TestPutScan: Run 5 scan
12/05/30 18:57:47 ERROR hbase.TestPutScan: Expected value: value
0000001 0000005, got: value 0496107 0000005

That's a split so the ClientScanner did a reset on the start row. So
I'm going to fix your code and see if I can get anything else.

J-D

On Wed, May 30, 2012 at 11:56 AM, Jean-Daniel Cryans
<jdcry...@apache.org>    wrote:

I'm running it here, but I just remembered about this issue:

"HTable.ClientScanner needs to clone the Scan object"
https://issues.apache.org/jira/browse/HBASE-4891

And since you are reusing that Scan object, you could definitely hit this
issue.

J-D

On Tue, May 29, 2012 at 11:37 PM, Ondřej Stašek
<ondrej.sta...@firma.seznam.cz>    wrote:

Here it is:

http://pastebin.com/0AgsQjur


On 29.5.2012 22:44, Jean-Daniel Cryans wrote:

Care to share that TestPutScan? Just attach it in a pastebin

Thx,

J-D

On Tue, May 29, 2012 at 6:13 AM, Ondřej Stašek
<ondrej.sta...@firma.seznam.cz>      wrote:

My program writes changes to HBase table by issuing lots of Puts
(autoCommit
turned off, flush on end) and afterwards uses ResultScanner on whole
table
to read all rows and act upon them. My problem is that on several
occasions
scan does not return expected rows. Either scan does not start on the
beginning of table or somewhere during scan I got old data (not those
written by Puts before).

I have even written simple test application to simulate this behavior:
1. write 1M simple numbered rows to a table
2. scan through table to test output, delete every 10th row
3. scan again after delete
4. repeat until error found

Sample output:

12/05/29 00:32:12 INFO hbase.TestPutScan: Run 342 put 1000000 rows
12/05/29 00:32:35 INFO hbase.TestPutScan: Run 342 scan + del every
10th
row
12/05/29 00:33:29 INFO hbase.TestPutScan: Run 342 scan
12/05/29 00:33:29 ERROR hbase.TestPutScan: Expected value: value
0000001
0000342, got: value 0281999 0000342

This means, that program expected to get first row, but got 281999th.

This test ran on "minicluster" of 2 regionservers runing Cloudera's
cdh3u4
distribution.

Today I got 3 errors like that and from RS's log it seems that in the
same
time hbase balancer issued reassign command for this table region
(table
have only 1 region).

Any pointers on what to check or what to send you to help resolve this
issue?

Regards

Ondrej Stasek

--
Ondřej Stašek
Programátor senior
Seznam.cz, a.s.
Nádražní 159/21
370 01 České Budějovice 6

tel.: +420 386 325 467
gsm: +420 603 857 602
icq: 164660005
ondrej.sta...@firma.seznam.cz
http://www.seznam.cz

Re: Problems with scan after lot of Puts

Reply via email to