Hello,

Does anyone have any update on this issue, 
https://issues.apache.org/jira/browse/DRILL-4271?  Are there any plan that this 
would be investigated/fixed?

Thanks
Kumiko

-----Original Message-----
From: Kumiko Yada [mailto:kumiko.y...@ds-iq.com] 
Sent: Thursday, January 14, 2016 3:44 PM
To: user@drill.apache.org; altekruseja...@gmail.com
Subject: RE: Drill query does not return all results from HBase

The query time was very short on the one with the incorrect result.

Thanks
Kumiko

-----Original Message-----
From: Jason Altekruse [mailto:altekruseja...@gmail.com]
Sent: Thursday, January 14, 2016 1:25 PM
To: user <user@drill.apache.org>
Subject: Fwd: Drill query does not return all results from HBase

Thanks for the update, I'm forwarding your message back to the list.

Just to confirm, was the query time longer on the the one with the incorrect 
result? In the incorrect case I think we are just misreading the HBase metadata 
during our optimization to return row counts without reading any data. This 
should be really fast, and noticeably different than running a complete query, 
even with a small dataset as we have to read in your table and run an 
aggregation over it.

This would just be a final confirmation of where the issue is occurring, I will 
hopefully have time soon to get this fixed but I'm wrapping up some other 
things right now.


---------- Forwarded message ----------
From: Kumiko Yada <kumiko.y...@ds-iq.com>
Date: Thu, Jan 14, 2016 at 12:53 PM
Subject: RE: Drill query does not return all results from HBase
To: Jason Altekruse <altekruseja...@gmail.com>


Jason,



I’m sorry.  My testing was incorrect last night.  I’m not sure what I did 
differently; however your guess were correct.  When I did the one column count, 
the row count was correct.  Here is the additional testing results.



My company has been invested to use the drill, and it’s very important for us 
that this is fixed.  Let me know if I can do anything to get this issue to be 
fixed.  I really appreciate you that you are looking into issue!

Hbase table (1 column family, 5 columns, 10000000 rows)

COUNT(*) - row count is correct

1 column count - row count is correct

*Hbase table (1 column family, 6 columns,  10000000 rows)*

*COUNT(*) - row count is incorrect (**returned 6724 rows)*

1 column count - row count is correct

*Hbase table (2 column family, 6 columns in each columns family, 10000000
rows)*

*COUNT(*) - row count is incorrect (returned 3362 rows)*

1 column count - row count is correct

Hbase table (2 column family, 2 columns in each columns family, 10000000
rows)

COUNT(*) - row count is correct

1 column count - row count is correct

*Hbasetable (2 column family, 4 columns in one column family and 2 columns in 
other column family, 10000000 rows)*

*COUNT(*) - row count is incorrect (returned 6723 rows)*

1 column count - row count is correct

Hbasetable (2 column family, 1 column in one column family and 3 columns in 
other column family, 10000000 rows)

COUNT(*) - row count is correct

1 column count - row count is correct



Thanks

Kumiko



*From:* Kumiko Yada
*Sent:* Wednesday, January 13, 2016 7:28 PM
*To:* 'Jason Altekruse' <altekruseja...@gmail.com>
*Cc:* Ki Kang <ki.k...@ds-iq.com>; Kevin Verhoeven < kevin.verhoe...@ds-iq.com>
*Subject:* RE: Drill query does not return all results from HBase



I also run the query to display only 1 column with no limit to try force a full 
scan, but the result was the same, just 10000 rows selected.  With the same 
table (contains 6 columns), I run the query to display the row_key, and it 
display all records, 10,000,000 rows.



-Kumiko



*From:* Kumiko Yada
*Sent:* Wednesday, January 13, 2016 7:24 PM
*To:* 'Jason Altekruse' <altekruseja...@gmail.com>
*Cc:* Ki Kang <ki.k...@ds-iq.com>; Kevin Verhoeven < kevin.verhoe...@ds-iq.com>
*Subject:* RE: Drill query does not return all results from HBase



Jason



I run the query to display only 1 column for 100000 rows, and it only returned 
10000 rows.



-Kumiko



*From:* Jason Altekruse [mailto:altekruseja...@gmail.com 
<altekruseja...@gmail.com>]
*Sent:* Wednesday, January 13, 2016 6:39 PM
*To:* Kumiko Yada <kumiko.y...@ds-iq.com>
*Cc:* Ki Kang <ki.k...@ds-iq.com>; Kevin Verhoeven < kevin.verhoe...@ds-iq.com>

*Subject:* Re: Drill query does not return all results from HBase



I know in a number of cases we have special optimizer rules that try to skip 
reading the dataset all together if we have metadata for the number of rows and 
all that is requested is a count(*). I assume that this is the case with HBase, 
and this may be where we aren't doing something correctly.
Can you try to run a 'sum', or other aggregate query on one of the columns to 
see if a full scan of the data is operating correctly?



On Wed, Jan 13, 2016 at 6:27 PM, Kumiko Yada <kumiko.y...@ds-iq.com> wrote:

Thank you, Jason!

Let me know if you need any help on this. I will be glad to help on repro 
and/or test the fix.

Thanks
Kumiko

-----Original Message-----
From: Jason Altekruse [mailto:altekruseja...@gmail.com]
Sent: Wednesday, January 13, 2016 6:24 PM
To: user <user@drill.apache.org>

Cc: Aditya Kishore <adityakish...@gmail.com>; Kevin Verhoeven < 
kevin.verhoe...@ds-iq.com>
Subject: Re: Drill query does not return all results from HBase

Thanks for filing the issue. I haven't worked much with HBase, but this is a 
critical wrong results issues, so I will be taking a look at this soon if no 
one else raises their hand.

On Wed, Jan 13, 2016 at 6:20 PM, Kumiko Yada <kumiko.y...@ds-iq.com> wrote:

> I opened the bug on this.  The drill is returning the correct rows 
> when the hbase contains 5 or less columns, but not 6 or more columns.
>
> https://issues.apache.org/jira/browse/DRILL-4271
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Kumiko Yada [mailto:kumiko.y...@ds-iq.com]
> Sent: Wednesday, January 13, 2016 4:52 PM
> To: user@drill.apache.org
> Cc: Aditya Kishore <adityakish...@gmail.com>; Kevin Verhoeven < 
> kevin.verhoe...@ds-iq.com>
> Subject: RE: Drill query does not return all results from HBase
>
> We are using the HBase 1.0.0. & CDH 5.4.  I found out the correct row 
> count returned when the Hbase table contains only 1 column family, 1 
> column, but the incorrect row count is returned for the Hbase table 
> contains 1 column family, 6 columns.
>
> This looks like the Drill issue.  Has anyone found any workaround?
>
> Thanks
> Kumiko
>
> -----Original Message-----
> From: Abhishek Girish [mailto:abhishek.gir...@gmail.com]
> Sent: Tuesday, January 12, 2016 6:51 PM
> To: user <user@drill.apache.org>
> Cc: Aditya Kishore <adityakish...@gmail.com>
> Subject: Re: Drill query does not return all results from HBase
>
> Well, the major version din't change if I remember it right, hence did 
> not share the info in my previous mail. I'm on HBase 1.1.1 right now 
> and don't see the issue. Also, I am on a MapR setup, which might not 
> be comparable with their CDH setups.
>
> On Tue, Jan 12, 2016 at 5:50 PM, Jason Altekruse 
> <altekruseja...@gmail.com
> >
> wrote:
>
> > Abhishek,
> >
> > What version of HBase did you have the problem with, and what 
> > version did you upgrade to that solved the problem? I assume this 
> > would be useful information to compare your setup with Kevin's and
Kumiko's.
> >
> > - Jason
> >
> > On Tue, Jan 12, 2016 at 10:41 AM, Abhishek Girish < 
> > abhishek.gir...@gmail.com
> > > wrote:
> >
> > > I hit a very similar issue recently. Via HBase shell, i was able 
> > > to fetch all records, whereas I was only able to see a small 
> > > subset of records
> > when
> > > queried from Drill. Each time I inserted 1000 records, only about
> > > 50 of those would show up.
> > >
> > > Although I could repro' the problem consistently, it was resolved 
> > > once i updated my Hadoop setup. My guess is that it was a HBase 
> > > bug which got resolved. Although strange as it seems, it might not 
> > > have to do with
> > Drill
> > > itself.
> > >
> > > -Abhishek
> > >
> > > On Tue, Jan 12, 2016 at 7:52 AM, Jason Altekruse <
> > altekruseja...@gmail.com
> > > >
> > > wrote:
> > >
> > > > I'm not sure why this is happening, we have tests in our 
> > > > automated
> > suite
> > > > that I believe run some pretty large queries against Hbase and 
> > > > verify
> > the
> > > > results.
> > > >
> > > > Aditya, do you have some time available to try to reproduce this 
> > > > and diagnose the problem?
> > > >
> > > > On Wed, Jan 6, 2016 at 2:03 PM, Kumiko Yada 
> > > > <kumiko.y...@ds-iq.com>
> > > wrote:
> > > >
> > > > > I'm having the same issue.  Is there any workaround for this?
> > > > >
> > > > > Thanks
> > > > > Kumiko
> > > > >
> > > > > -----Original Message-----
> > > > > From: Kevin Verhoeven [mailto:kevin.verhoe...@ds-iq.com]
> > > > > Sent: Monday, December 21, 2015 10:37 AM
> > > > > To: user@drill.apache.org
> > > > > Subject: Drill query does not return all results from HBase
> > > > >
> > > > > We have a problem where a Drill query against HBase does not 
> > > > > return
> > all
> > > > > results. The following query should return over 100,000 rows, 
> > > > > but we
> > > only
> > > > > get about 1,030 back.
> > > > >
> > > > > SELECT row_key FROM `hbase`.`customer_staged` WHERE 
> > > > > customer_number =
> > > 800
> > > > >
> > > > > If we scan directly using the hbase shell we see over 100,000 
> > > > > rows,
> > but
> > > > > the same Drill query does not return a fraction of the 
> > > > > expected
> > > results.
> > > > We
> > > > > have also run a count against the table and Drill returns the 
> > > > > same
> > > 1,030
> > > > > number, which is far less than expect. What could be going wrong?
> > > > >
> > > > > We are running Drill 1.2 on Ubuntu 14.04 against CDH 5.4.3 
> > > > > (HBase
> > 1.0).
> > > > We
> > > > > run HBase on six RegionServers, the table has about 1.3 
> > > > > billion
> rows.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Kevin
> > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to