Finally managed to reproduce it with CDH distribution (So far I was testing with HBase 1.1 distributed with MapR, which does not have this bug).
This is essentially an HBase bug, HBASE-13262[1], which has been fixed in 1.0.1, 1.1.0. Please update your HBase distribution. [1] https://issues.apache.org/jira/browse/HBASE-13262 On Thu, Mar 17, 2016 at 3:19 PM, Kumiko Yada <kumiko.y...@ds-iq.com> wrote: > Aditya, > > > > When we were exchanging the emails, you mentioned to me that you > discovered another issue in case where the table is spit into multiple > regions and the first region returned to the client did not have any rows. > I think this issue is related to the issue that I’m seeing. Have you > opened the JIRA for this issue? Have you investigated/fixed this issue? > > > > Thanks > > Kumiko > > > > *From:* Aditya [mailto:adityakish...@gmail.com] > *Sent:* Thursday, March 17, 2016 3:02 PM > *To:* Kumiko Yada <kumiko.y...@ds-iq.com> > *Cc:* user@drill.apache.org; d...@drill.apache.org; > altekruseja...@gmail.com; Ki Kang <ki.k...@ds-iq.com>; Kevin Verhoeven < > kevin.verhoe...@ds-iq.com> > > *Subject:* Re: Drill query does not return all results from HBase > > > > Hi Kumiko, > > I have tried to reproduce this locally with Apache 1.x release but have > failed so far. > > From my mail exchange with Kevin on another thread, it appears that the > HBase scanner stops returning rows after a while which seem odd. > > Probably it is unique to CDH distribution. I am planning to setup a single > node CDH cluster to see if it I can reproduce it there. > > > > On Thu, Mar 17, 2016 at 2:56 PM, Kumiko Yada <kumiko.y...@ds-iq.com> > wrote: > > Hello, > > I provided all information that was requested; however, I haven't heard > back anything since February 24. > > Is anyone taking look at this? Are there any workarounds? > > https://issues.apache.org/jira/browse/DRILL-4271 > > Thanks > Kumiko > > -----Original Message----- > From: Aditya [mailto:adityakish...@gmail.com] > Sent: Friday, February 19, 2016 12:48 PM > To: user <user@drill.apache.org> > > Cc: altekruseja...@gmail.com; Ki Kang <ki.k...@ds-iq.com>; Kevin > Verhoeven <kevin.verhoe...@ds-iq.com> > Subject: Re: Drill query does not return all results from HBase > > Hi Kumiko, > > I apologies for not chiming in until now, considering that if there is a > bug here it is most probably put in by me :) > > I've assigned the JIRA to myself and going to take a l look. > > Would it be possible for you to either attach to the JIRA or send me > privately the Drill query profiles form both the correct and the incorrect > executions? > > Regards, > aditya... > > On Fri, Feb 19, 2016 at 12:34 PM, Kumiko Yada <kumiko.y...@ds-iq.com> > wrote: > > > Hello, > > > > Does anyone have any update on this issue, > > https://issues.apache.org/jira/browse/DRILL-4271? Are there any plan > > that this would be investigated/fixed? > > > > Thanks > > Kumiko > > > > -----Original Message----- > > From: Kumiko Yada [mailto:kumiko.y...@ds-iq.com] > > Sent: Thursday, January 14, 2016 3:44 PM > > To: user@drill.apache.org; altekruseja...@gmail.com > > Subject: RE: Drill query does not return all results from HBase > > > > The query time was very short on the one with the incorrect result. > > > > Thanks > > Kumiko > > > > -----Original Message----- > > From: Jason Altekruse [mailto:altekruseja...@gmail.com] > > Sent: Thursday, January 14, 2016 1:25 PM > > To: user <user@drill.apache.org> > > Subject: Fwd: Drill query does not return all results from HBase > > > > Thanks for the update, I'm forwarding your message back to the list. > > > > Just to confirm, was the query time longer on the the one with the > > incorrect result? In the incorrect case I think we are just misreading > > the HBase metadata during our optimization to return row counts > > without reading any data. This should be really fast, and noticeably > > different than running a complete query, even with a small dataset as > > we have to read in your table and run an aggregation over it. > > > > This would just be a final confirmation of where the issue is > > occurring, I will hopefully have time soon to get this fixed but I'm > > wrapping up some other things right now. > > > > > > ---------- Forwarded message ---------- > > From: Kumiko Yada <kumiko.y...@ds-iq.com> > > Date: Thu, Jan 14, 2016 at 12:53 PM > > Subject: RE: Drill query does not return all results from HBase > > To: Jason Altekruse <altekruseja...@gmail.com> > > > > > > Jason, > > > > > > > > I’m sorry. My testing was incorrect last night. I’m not sure what I > > did differently; however your guess were correct. When I did the one > > column count, the row count was correct. Here is the additional testing > results. > > > > > > > > My company has been invested to use the drill, and it’s very important > > for us that this is fixed. Let me know if I can do anything to get > > this issue to be fixed. I really appreciate you that you are looking > into issue! > > > > Hbase table (1 column family, 5 columns, 10000000 rows) > > > > COUNT(*) - row count is correct > > > > 1 column count - row count is correct > > > > *Hbase table (1 column family, 6 columns, 10000000 rows)* > > > > *COUNT(*) - row count is incorrect (**returned 6724 rows)* > > > > 1 column count - row count is correct > > > > *Hbase table (2 column family, 6 columns in each columns family, > > 10000000 > > rows)* > > > > *COUNT(*) - row count is incorrect (returned 3362 rows)* > > > > 1 column count - row count is correct > > > > Hbase table (2 column family, 2 columns in each columns family, > > 10000000 > > rows) > > > > COUNT(*) - row count is correct > > > > 1 column count - row count is correct > > > > *Hbasetable (2 column family, 4 columns in one column family and 2 > > columns in other column family, 10000000 rows)* > > > > *COUNT(*) - row count is incorrect (returned 6723 rows)* > > > > 1 column count - row count is correct > > > > Hbasetable (2 column family, 1 column in one column family and 3 > > columns in other column family, 10000000 rows) > > > > COUNT(*) - row count is correct > > > > 1 column count - row count is correct > > > > > > > > Thanks > > > > Kumiko > > > > > > > > *From:* Kumiko Yada > > *Sent:* Wednesday, January 13, 2016 7:28 PM > > *To:* 'Jason Altekruse' <altekruseja...@gmail.com> > > *Cc:* Ki Kang <ki.k...@ds-iq.com>; Kevin Verhoeven < > > kevin.verhoe...@ds-iq.com> > > *Subject:* RE: Drill query does not return all results from HBase > > > > > > > > I also run the query to display only 1 column with no limit to try > > force a full scan, but the result was the same, just 10000 rows > > selected. With the same table (contains 6 columns), I run the query > > to display the row_key, and it display all records, 10,000,000 rows. > > > > > > > > -Kumiko > > > > > > > > *From:* Kumiko Yada > > *Sent:* Wednesday, January 13, 2016 7:24 PM > > *To:* 'Jason Altekruse' <altekruseja...@gmail.com> > > *Cc:* Ki Kang <ki.k...@ds-iq.com>; Kevin Verhoeven < > > kevin.verhoe...@ds-iq.com> > > *Subject:* RE: Drill query does not return all results from HBase > > > > > > > > Jason > > > > > > > > I run the query to display only 1 column for 100000 rows, and it only > > returned 10000 rows. > > > > > > > > -Kumiko > > > > > > > > *From:* Jason Altekruse [mailto:altekruseja...@gmail.com < > > altekruseja...@gmail.com>] > > *Sent:* Wednesday, January 13, 2016 6:39 PM > > *To:* Kumiko Yada <kumiko.y...@ds-iq.com> > > *Cc:* Ki Kang <ki.k...@ds-iq.com>; Kevin Verhoeven < > > kevin.verhoe...@ds-iq.com> > > > > *Subject:* Re: Drill query does not return all results from HBase > > > > > > > > I know in a number of cases we have special optimizer rules that try > > to skip reading the dataset all together if we have metadata for the > > number of rows and all that is requested is a count(*). I assume that > > this is the case with HBase, and this may be where we aren't doing > something correctly. > > Can you try to run a 'sum', or other aggregate query on one of the > > columns to see if a full scan of the data is operating correctly? > > > > > > > > On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada <kumiko.y...@ds-iq.com> > > wrote: > > > > Thank you, Jason! > > > > Let me know if you need any help on this. I will be glad to help on > > repro and/or test the fix. > > > > Thanks > > Kumiko > > > > -----Original Message----- > > From: Jason Altekruse [mailto:altekruseja...@gmail.com] > > Sent: Wednesday, January 13, 2016 6:24 PM > > To: user <user@drill.apache.org> > > > > Cc: Aditya Kishore <adityakish...@gmail.com>; Kevin Verhoeven < > > kevin.verhoe...@ds-iq.com> > > Subject: Re: Drill query does not return all results from HBase > > > > Thanks for filing the issue. I haven't worked much with HBase, but > > this is a critical wrong results issues, so I will be taking a look at > > this soon if no one else raises their hand. > > > > On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada <kumiko.y...@ds-iq.com> > > wrote: > > > > > I opened the bug on this. The drill is returning the correct rows > > > when the hbase contains 5 or less columns, but not 6 or more columns. > > > > > > https://issues.apache.org/jira/browse/DRILL-4271 > > > > > > Thanks > > > Kumiko > > > > > > -----Original Message----- > > > From: Kumiko Yada [mailto:kumiko.y...@ds-iq.com] > > > Sent: Wednesday, January 13, 2016 4:52 PM > > > To: user@drill.apache.org > > > Cc: Aditya Kishore <adityakish...@gmail.com>; Kevin Verhoeven < > > > kevin.verhoe...@ds-iq.com> > > > Subject: RE: Drill query does not return all results from HBase > > > > > > We are using the HBase 1.0.0. & CDH 5.4. I found out the correct > > > row count returned when the Hbase table contains only 1 column > > > family, 1 column, but the incorrect row count is returned for the > > > Hbase table contains 1 column family, 6 columns. > > > > > > This looks like the Drill issue. Has anyone found any workaround? > > > > > > Thanks > > > Kumiko > > > > > > -----Original Message----- > > > From: Abhishek Girish [mailto:abhishek.gir...@gmail.com] > > > Sent: Tuesday, January 12, 2016 6:51 PM > > > To: user <user@drill.apache.org> > > > Cc: Aditya Kishore <adityakish...@gmail.com> > > > Subject: Re: Drill query does not return all results from HBase > > > > > > Well, the major version din't change if I remember it right, hence > > > did not share the info in my previous mail. I'm on HBase 1.1.1 right > > > now and don't see the issue. Also, I am on a MapR setup, which might > > > not be comparable with their CDH setups. > > > > > > On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse > > > <altekruseja...@gmail.com > > > > > > > wrote: > > > > > > > Abhishek, > > > > > > > > What version of HBase did you have the problem with, and what > > > > version did you upgrade to that solved the problem? I assume this > > > > would be useful information to compare your setup with Kevin's and > > Kumiko's. > > > > > > > > - Jason > > > > > > > > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish < > > > > abhishek.gir...@gmail.com > > > > > wrote: > > > > > > > > > I hit a very similar issue recently. Via HBase shell, i was able > > > > > to fetch all records, whereas I was only able to see a small > > > > > subset of records > > > > when > > > > > queried from Drill. Each time I inserted 1000 records, only > > > > > about > > > > > 50 of those would show up. > > > > > > > > > > Although I could repro' the problem consistently, it was > > > > > resolved once i updated my Hadoop setup. My guess is that it was > > > > > a HBase bug which got resolved. Although strange as it seems, it > > > > > might not have to do with > > > > Drill > > > > > itself. > > > > > > > > > > -Abhishek > > > > > > > > > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse < > > > > altekruseja...@gmail.com > > > > > > > > > > > wrote: > > > > > > > > > > > I'm not sure why this is happening, we have tests in our > > > > > > automated > > > > suite > > > > > > that I believe run some pretty large queries against Hbase and > > > > > > verify > > > > the > > > > > > results. > > > > > > > > > > > > Aditya, do you have some time available to try to reproduce > > > > > > this and diagnose the problem? > > > > > > > > > > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada > > > > > > <kumiko.y...@ds-iq.com> > > > > > wrote: > > > > > > > > > > > > > I'm having the same issue. Is there any workaround for this? > > > > > > > > > > > > > > Thanks > > > > > > > Kumiko > > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Kevin Verhoeven [mailto:kevin.verhoe...@ds-iq.com] > > > > > > > Sent: Monday, December 21, 2015 10:37 AM > > > > > > > To: user@drill.apache.org > > > > > > > Subject: Drill query does not return all results from HBase > > > > > > > > > > > > > > We have a problem where a Drill query against HBase does not > > > > > > > return > > > > all > > > > > > > results. The following query should return over 100,000 > > > > > > > rows, but we > > > > > only > > > > > > > get about 1,030 back. > > > > > > > > > > > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE > > > > > > > customer_number = > > > > > 800 > > > > > > > > > > > > > > If we scan directly using the hbase shell we see over > > > > > > > 100,000 rows, > > > > but > > > > > > > the same Drill query does not return a fraction of the > > > > > > > expected > > > > > results. > > > > > > We > > > > > > > have also run a count against the table and Drill returns > > > > > > > the same > > > > > 1,030 > > > > > > > number, which is far less than expect. What could be going > wrong? > > > > > > > > > > > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3 > > > > > > > (HBase > > > > 1.0). > > > > > > We > > > > > > > run HBase on six RegionServers, the table has about 1.3 > > > > > > > billion > > > rows. > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Kevin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >