Update:
Just in case anyone hits this issue in future with the AWS managed
HBase, the fix is to use a very specific version of the driver
For thick client:
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-core</artifactId>
<version>4.14.1-HBase-1.4</version>
</dependency>
For thin client:
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-queryserver-client</artifactId>
<version>4.14.1-HBase-1.4</version>
</dependency>
This required support help from AWS as this driver version is not
mentioned on the official Apache Phoenix download page
Regards
Simon
On Sun, 2019-11-17 at 21:03 +0000, Simon Mottram wrote:
> Phoenix Version: 4.14.2-HBase-1.4
> HBase Version: AWS EMR
> Release
> label:emr-5.24.1
> Hadoop distribution:Amazon
> Applications:Phoenix 4.14.1,
> Hue 4.4.0, HBase 1.4.9
>
>
> Having an issue where ANY(ARRAY[]) stops the query returning any
> results when used in a 'AND' conjunction
>
> e.g (fielda = 'a') AND (fieldb = any(array['a','b','c]))
>
> Always returns zero results, if i change to disjunction (OR) it works
> fine but obviously isn't what's wanted here. Excuse the long post
> but
> I wanted to be as clear as possible.
>
> It's quite possible I have misunderstood the way ANY() works but...
>
> Here's a simple query that returns a correct number of results:
>
> SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME
> FROM DEV_OAPI.OBSERVATION
> WHERE (BIOMATERIALNAME = 'ROOT00386') AND (TRIALNAME =
> 'TRIAL00015')
> ORDER BY OBSERVATIONDATE DESC
> LIMIT 10
> OFFSET 0
>
> So there's definitely records where biomaterial name and trialname
> have
> these values
>
> If I change it to
>
> SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME
> FROM DEV_OAPI.OBSERVATION
> WHERE (TRIALNAME = ANY(ARRAY['TRIAL00015', 'SOMETHING ELSE']))
> ORDER BY OBSERVATIONDATE DESC
> LIMIT 10
> OFFSET 0
>
> I get valid results
>
> So the ANY(ARRAY[]) function works
>
> Here's the explain which looks very odd to more, but it works
>
> PLAN
>
>
>
> CLIENT 1-CHUNK 50574 ROWS 314572800 BYTES PARALLEL 1-WAY FULL SCAN
> OVER
> DEV_OAPI.OBSERVATION
>
>
>
> SERVER FILTER BY
> org.apache.phoenix.expression.function.ArrayAnyComparisonExpression
> [children=[ARRAY['TRIAL00015'], TRIALNAME =
> org.apache.phoenix.expression.function.ArrayElemRefExpression
> [children=[ARRAY['TRIAL00015'], 1]]]]
> SERVER TOP 10 ROWS SORTED BY [OBSERVATIONDATE
> DESC]
> C
> CLIENT MERGE
> SORT
>
> CLIENT LIMIT 10
>
> So far so good, BUT.
>
> However if I combine the ARRAY expression with any expression using
> AND
> I get zero results, even tho as above both sides of the conjunction
> return true.
>
> e.g.
>
> SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME
> FROM DEV_OAPI.OBSERVATION
> WHERE (BIOMATERIALNAME = 'ROOT00386') AND (TRIALNAME =
> ANY(ARRAY['TRIAL00015', 'SOMETHING ELSE']))
> ORDER BY OBSERVATIONDATE DESC
> LIMIT 10
> OFFSET 0
>
> Explain (newlines added):
> SERVER FILTER BY
> (BIOMATERIALNAME = 'SCION00424'
> AND
> org.apache.phoenix.expression.function.ArrayAnyComparisonExpression
> [children=[ARRAY['TRIAL00015','SOMETHING ELSE'],
> TRIALNAME =
> org.apache.phoenix.expression.function.ArrayElemRefExpression
> [children=[ARRAY['TRIAL00015','SOMETHING ELSE'], 1]]]])
>
> Just out of interest I tried with strings only in the array check
>
> SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME
> FROM DEV_OAPI.OBSERVATION
> WHERE (BIOMATERIALNAME = 'ROOT00386') AND ('TRIAL00015' =
> ANY(ARRAY['TRIAL00015']))
> ORDER BY OBSERVATIONDATE DESC
> LIMIT 10
> OFFSET 0
>
> This works fine (in a kind of unhelpful way)
>
> I have tested using the thick client:
> <!-- HBASE THICK CLIENT DEPS -->
> <dependency>
> <groupId>org.apache.phoenix</groupId>
> <artifactId>phoenix-core</artifactId>
> <version>4.14.2-HBase-1.4</version>
> </dependency>
>
> and the thin client
>
> <dependency>
> <groupId>org.apache.phoenix</groupId>
> <artifactId>phoenix-queryserver-client</artifactId>
> <version>4.14.2-HBase-1.4</version>
> </dependency>
>
> I've tried using braces and re-ordering but any query of the form:
> <anyclause> AND field = ANY(ARRAY['value1'...])
> Returns zero results regardless of values
>
> We can't change version as we are using the Amazon AWS EMR managed
> stack and no other phoenix libraries work.
>
> Thanks for taking the time to read this far!
>
> Cheers
>
> Simon