Mary, You can just call Statement.setMaxRows(10) to emulate what SQuirrel is doing. Thanks, James
On Tue, Feb 24, 2015 at 9:09 AM, Matthew Johnson <[email protected]> wrote: > Hi Maryann, > > > > I have two environments, one with just some dummy data, and one with real > data in it. The dummy data one gave me results when I queried and the real > data didn’t, but turns out that is just because the dummy data was much > more coherent (because it was manually created for a specific test) so the > RHS table always matches the LHS and therefore the join gives me results. > > > > I have attached a script that demonstrates my problem (create 2 Phoenix > tables, insert some rows, and run a query using a join). When I run this on > my cluster, I consistently see the issue I am having. If I set the LIMIT in > Squirrel to 10, I get no results, but if I set it to 26, I get a result > (since I have 26 rows in the RHS table and I am intentionally querying for > the last one). > > > > Please give it a go and see if it reproduces for you – are you using > Squirrel? If so, what version? > > > > Thanks! > > Matt > > > > > > *From:* Maryann Xue [mailto:[email protected]] > *Sent:* 24 February 2015 16:41 > > *To:* [email protected] > *Subject:* Re: Inner Join not returning any results in Phoenix > > > > Hi Matt, > > > > I just noticed these lines in your very first message: > > > > *PS* Something that may or may not be of note: In the environments I am > using: > > WORKING: *hbase-0.98.8-hadoop2 / phoenix-4.2.2-server.jar* > > FAILING: *hbase-0.98.9-hadoop2* / *phoenix-4.2.2-server.jar* > > > > What do you mean by WORKING and FAILING? > > > > I still cannot reproduce the bug here. Could you please post DDLs you used > for related tables? > > > > > > Thanks, > > Maryann > > > > > > > > On Tue, Feb 24, 2015 at 11:27 AM, James Taylor <[email protected]> > wrote: > > FYI, SQuirrel sets the max rows to return as 100. You can change this in > the tool, though. > > > > On Tuesday, February 24, 2015, Maryann Xue <[email protected]> wrote: > > Thanks a lot, Matt, for the reply! Very helpful. "*SERVER FILTER BY > PageFilter 100*" does look like a but here. I will try again to reproduce > it. > > > > > > Thanks, > > Maryann > > > > On Tue, Feb 24, 2015 at 6:07 AM, Matthew Johnson <[email protected]> > wrote: > > Hi Maryann, > > > > Thanks for that - I will schedule an update to the latest version of > Phoenix then for later this week (and try out the merge-join hints). > > > > In the meantime, here are my explain plans: > > > > *JOIN WITH NO SQUIRREL LIMIT* > > > > *PLAN* > > *CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1* > > * PARALLEL INNER-JOIN TABLE 0* > > * CLIENT 3-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable2* > > > > *JOIN WITH SQUIRREL LIMIT 100* > > > > *PLAN* > > *CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1* > > *CLIENT 100 ROW LIMIT* > > * PARALLEL INNER-JOIN TABLE 0* > > * CLIENT 3-CHUNK SERIAL 1-WAY FULL SCAN OVER mytable2* > > * SERVER FILTER BY PageFilter 100* > > * SERVER 100 ROW LIMIT* > > * CLIENT 100 ROW LIMIT* > > > > > > I’m not really sure how to read that, but it does seem to suggest that > ‘mytable2’ is being limited to 100 – thoughts? > > > > Cheers, > > Matt > > > > *From:* Maryann Xue [mailto:[email protected] <[email protected]>] > > *Sent:* 23 February 2015 18:10 > > > *To:* [email protected] > *Subject:* Re: Inner Join not returning any results in Phoenix > > > > Hi Matt, > > > > Yes, the upgrade is as easy as that. I believe things will work fine with > existing tables. > > I tried with a similar query but didn't see that it was a Phoenix bug. So > could you please try the following explain statement and see the execution > plan: > > > > EXPLAIN *SELECT * FROM "mytable1" hc* > > *INNER JOIN “mytable2” bs* > > *On hc."myId" = bs.”myId”* > > > > > > Thanks, > > Maryann > > > > > > On Fri, Feb 20, 2015 at 1:09 PM, Matthew Johnson <[email protected]> > wrote: > > Hi Maryann, > > > > That’s a good point – I am using 4.2.2, so if that feature is 4.3+ then > that would explain why it’s not working. Is upgrading versions of Phoenix > as simple as removing the previous jar from HBase lib folder and dropping > the new Phoenix jar in (and restarting HBase)? Will all the existing > Phoenix tables and views be backwards-compatible and work with the new > version? > > > > Cheers, > > Matt > > > > > > *From:* Maryann Xue [mailto:[email protected] <[email protected]>] > > *Sent:* 20 February 2015 17:46 > > > *To:* [email protected] > *Subject:* Re: Inner Join not returning any results in Phoenix > > > > Which version of Phoenix are you using, Matt? This feature is only > available in the latest releases of 4.3/3.3. > > > > On Fri, Feb 20, 2015 at 12:11 PM, Matthew Johnson <[email protected]> > wrote: > > Hi Maryann, > > > > Unfortunately my two tables are roughly the same size (~500k), but I have > tested a different join where one table is ~500k and the other is ~20k and > putting the larger one first is definitely far more performant. I believe > you are right about running out of memory, I can see this repeated a few > times in the region server logs followed by what appears to be a restart or > disconnect: > > > > *[JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host > machine (eg GC): pause of approximately 1083ms* > > > > I have been looking at the Phoenix page on joins ( > http://phoenix.apache.org/joins.html) and it mentions using Sort-Merge > joins for large tables by using a hint. I have tried this though with no > success: > > > > *SELECT /*+ USE_SORT_MERGE_JOIN*/ count(*) FROM “mytable1” hc* > > *INNER JOIN “mytable2” bs* > > *On hc.”myId” = bs.”myId”* > > > > Am I putting the hint in the wrong place? Does it need to go next to the > JOIN rather than the SELECT? > > > > I will try increasing the memory available to the Region Servers as well > to see if that helps. > > > > Thanks! > > Matt > > > > > > *From:* Maryann Xue [mailto:[email protected] <[email protected]>] > > *Sent:* 20 February 2015 16:28 > *To:* [email protected] > > > *Subject:* Re: Inner Join not returning any results in Phoenix > > > > Hi Matt, > > > > The error you got with "Limit Rows" off might be related to insufficient > memory on region servers for one of your tables. Which is the larger table > between table1 and table2? You might want to try putting the larger table > as the first table in your join query and see if it works. > > > > And I will quickly check if the LIMIT problem is a Phoenix bug and will > keep you posted. > > > > > > Thanks, > > Maryann > > > > > > On Fri, Feb 20, 2015 at 11:14 AM, Matthew Johnson <[email protected]> > wrote: > > Hi Abe, > > > > Glad to hear I’m not alone! Will try and figure out exactly what’s > happening and maybe raise a Jira :-) > > > > > > @Constantin – I have tried with and without the “Limit Rows” – but without > it, and without any indexes, the query runs for a while (about 10 minutes?) > and then throws an error: > > > > *Error: Encountered exception in sub plan [0] execution.* > > > > Which I’m guessing is either HBase or Zookeeper timeout. The weird thing > is that in standard SQL databases (eg Oracle, MySQL etc) then the “Limit > Rows” does not affect any aggregate functions like ‘count’, because the > actual number of result rows for a count is just 1 row (the count itself). > But in HBase it seems that the Row Limit, as Abe mentioned, is applied to > one of the table BEFORE it does the join, so it affects the results of the > ‘count’ function. > > > > When I try to create my indexes so I am able to do the join without Row > Limit, I get the following error: > > > > ERROR 1029 (42Y88): Mutable secondary indexes must have the > hbase.regionserver.wal.codec property set to > org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec in the > hbase-sites.xml of every region server > > > > Which I am happy to do (will have to wait until outside of business hours > though), but I am curious, will this have any impact on the rest of my > cluster and could it have any unforeseen consequences? > > > > Thanks again for the input! > > > > Cheers, > > Matt > > > > > > *From:* Ciureanu, Constantin (GfK) [mailto:[email protected] > <[email protected]>] > *Sent:* 20 February 2015 15:48 > > > *To:* [email protected] > *Subject:* RE: Inner Join not returning any results in Phoenix > > > > Hello Matt, > > > > http://codingclues.eu/2008/the-squirrel-100-rows-problem/ > > > > Can you please test again after unchecking “Contents- Limit rows” and “SQL > – Limit rows”? > > > > [image: SQL tab] > > > > P.S. Off-topic – it’s as funny as this “problem” (not possible to send an > email for more than 500 miles away J > http://www.ibiblio.org/harris/500milemail.html ) > > > > Regards, > > Constantin > > > > *From:* Abe Weinograd [mailto:[email protected] <[email protected]>] > *Sent:* Friday, February 20, 2015 4:18 PM > *To:* user > *Subject:* Re: Inner Join not returning any results in Phoenix > > > > Matt, > > > > I have seen this same issue. When passing a LIMIT to a query with joins > (most query tools do it implicitly), Phoenix seems to apply that to the > table on the right of the join I believe. I hadn't had a chance to play > with it more and file a JIRA, but what you are describing is consistent > with what I have seen. > > > > Abe > > > > On Fri, Feb 20, 2015 at 10:04 AM, Matthew Johnson <[email protected]> > wrote: > > Hi Constantin, > > > > Many thanks for your reply – the quotes were both of the same type (double > quotes for table and column names, single quotes for string literals), it > is just my email client that formatted them weirdly, sorry! > > > > I have discovered what I believe is an important piece of the puzzle to my > problem. I am using Squirrel SQL as my JDBC client for Phoenix, and it has > a “Limit Rows” feature. When I try and count the number of rows in a single > table: > > > > *select count(*) from “mytable1”* > > > > I get the expected number of results (eg 20,000). But when I join two > tables together, it seems that the “Limit Rows” from Squirrel is somehow > being applied before the join is performed, and if “Limit Rows” is set to > 100 I get 100 results or less. If the inner join is quite sparse (eg 20,000 > rows in a table but only 100 of these will join with a second table) then I > believe it tries to join the first 100 it finds and returns no results. In > my experience of Oracle or MySQL, joins are done entirely on server side > and then you just get back the number of rows you limited, rather than what > appears to be happening which is the row limit is applied to the first > table before the join is attempted with the second table. Is that how > Phoenix works? > > > > I have also discovered that I get different results (with “Limit Rows” > turned on) depending on which order I join the tables: > > > > *SELECT count(*) FROM “mytable1” hc* > > *INNER JOIN “mytable2” bs* > > *On hc.”myId” = bs.”myId”* > > > > Gives me a very different number of results than: > > > > *SELECT count(*) FROM “mytable2” bs* > > *INNER JOIN “mytable1” hc* > > *On hc.”myId” = bs.”myId”* > > > > > > Unfortunately I cannot test whether I get the same number of results with > “Limit Rows” turned off because my query times out! So I am now looking at > creating secondary indexes on the “myId” column in both tables to see if I > am able to do this join quicker. Does a join like this use a lot of memory > on server side? Is something likely to be running out of resources? > > > > Many thanks again for your time. > > > > Cheers, > > Matt > > > > > > *From:* Ciureanu, Constantin (GfK) [mailto:[email protected] > <[email protected]>] > *Sent:* 20 February 2015 14:40 > *To:* [email protected] > *Subject:* RE: Inner Join not returning any results in Phoenix > > > > Hi Matthew, > > > > Is it working without the quotes “ / *"* ? (I see you are using 2 > types of quotes, weird) > > I guess that’s not needed, and probably causing troubles. I don’t have > to use quotes anyway. > > > > Alternatively check the types of data in those 2 tables (if the field > types are not the same in both tables, the join will not work). > > > > Good luck, > > Constantin > > > > *From:* Matthew Johnson [mailto:[email protected] > <[email protected]>] > *Sent:* Friday, February 20, 2015 12:54 PM > *To:* [email protected] > *Subject:* Inner Join not returning any results in Phoenix > > > > Hi guys, > > > > I’m a little bit stuck with doing an Inner Join with Phoenix. I set up one > environment, created tables in HBase, and then created views (rather than > tables) in Phoenix, and am able to query as expected (when I join my two > tables I see results). I’ve just promoted to another environment, with the > exact same setup, but my Inner Join returns no results! > > > > I run the following two individual queries: > > > > *SELECT * FROM "mytable1" hc* > > *where hc."myId" = 'XS0'* > > > > *SELECT * FROM "mytable2" bs* > > *where bs."myId" = 'XS0'* > > > > And both of these queries give results. But when I run: > > > > *SELECT * FROM "mytable1" hc* > > *INNER JOIN “mytable2” bs* > > *On hc."myId" = bs.”myId”* > > > > I get no results. I also get no results if I try: > > > > *SELECT * FROM "mytable1" hc* > > *where hc."myId" in (select distinct “myId” from “mytable2”)* > > > > I have checked in HBase shell and can see the “myId” value is as expected > (XS0 in both tables). I am not sure if there are any logs that I can look > at to get some insight? > > > > Many thanks in advance for any suggestions! > > > > Cheers, > > Matt > > > > *PS* Something that may or may not be of note: In the environments I am > using: > > WORKING: *hbase-0.98.8-hadoop2 / phoenix-4.2.2-server.jar* > > FAILING: *hbase-0.98.9-hadoop2* / *phoenix-4.2.2-server.jar* > > > > > > > > > > > > >
