Phoenix Version: 4.14.2-HBase-1.4 HBase Version: AWS EMR Release label:emr-5.24.1 Hadoop distribution:Amazon Applications:Phoenix 4.14.1, Hue 4.4.0, HBase 1.4.9
Having an issue where ANY(ARRAY[]) stops the query returning any results when used in a 'AND' conjunction e.g (fielda = 'a') AND (fieldb = any(array['a','b','c])) Always returns zero results, if i change to disjunction (OR) it works fine but obviously isn't what's wanted here. Excuse the long post but I wanted to be as clear as possible. It's quite possible I have misunderstood the way ANY() works but... Here's a simple query that returns a correct number of results: SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME FROM DEV_OAPI.OBSERVATION WHERE (BIOMATERIALNAME = 'ROOT00386') AND (TRIALNAME = 'TRIAL00015') ORDER BY OBSERVATIONDATE DESC LIMIT 10 OFFSET 0 So there's definitely records where biomaterial name and trialname have these values If I change it to SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME FROM DEV_OAPI.OBSERVATION WHERE (TRIALNAME = ANY(ARRAY['TRIAL00015', 'SOMETHING ELSE'])) ORDER BY OBSERVATIONDATE DESC LIMIT 10 OFFSET 0 I get valid results So the ANY(ARRAY[]) function works Here's the explain which looks very odd to more, but it works PLAN CLIENT 1-CHUNK 50574 ROWS 314572800 BYTES PARALLEL 1-WAY FULL SCAN OVER DEV_OAPI.OBSERVATION SERVER FILTER BY org.apache.phoenix.expression.function.ArrayAnyComparisonExpression [children=[ARRAY['TRIAL00015'], TRIALNAME = org.apache.phoenix.expression.function.ArrayElemRefExpression [children=[ARRAY['TRIAL00015'], 1]]]] SERVER TOP 10 ROWS SORTED BY [OBSERVATIONDATE DESC] C CLIENT MERGE SORT CLIENT LIMIT 10 So far so good, BUT. However if I combine the ARRAY expression with any expression using AND I get zero results, even tho as above both sides of the conjunction return true. e.g. SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME FROM DEV_OAPI.OBSERVATION WHERE (BIOMATERIALNAME = 'ROOT00386') AND (TRIALNAME = ANY(ARRAY['TRIAL00015', 'SOMETHING ELSE'])) ORDER BY OBSERVATIONDATE DESC LIMIT 10 OFFSET 0 Explain (newlines added): SERVER FILTER BY (BIOMATERIALNAME = 'SCION00424' AND org.apache.phoenix.expression.function.ArrayAnyComparisonExpression [children=[ARRAY['TRIAL00015','SOMETHING ELSE'], TRIALNAME = org.apache.phoenix.expression.function.ArrayElemRefExpression [children=[ARRAY['TRIAL00015','SOMETHING ELSE'], 1]]]]) Just out of interest I tried with strings only in the array check SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME FROM DEV_OAPI.OBSERVATION WHERE (BIOMATERIALNAME = 'ROOT00386') AND ('TRIAL00015' = ANY(ARRAY['TRIAL00015'])) ORDER BY OBSERVATIONDATE DESC LIMIT 10 OFFSET 0 This works fine (in a kind of unhelpful way) I have tested using the thick client: <!-- HBASE THICK CLIENT DEPS --> <dependency> <groupId>org.apache.phoenix</groupId> <artifactId>phoenix-core</artifactId> <version>4.14.2-HBase-1.4</version> </dependency> and the thin client <dependency> <groupId>org.apache.phoenix</groupId> <artifactId>phoenix-queryserver-client</artifactId> <version>4.14.2-HBase-1.4</version> </dependency> I've tried using braces and re-ordering but any query of the form: <anyclause> AND field = ANY(ARRAY['value1'...]) Returns zero results regardless of values We can't change version as we are using the Amazon AWS EMR managed stack and no other phoenix libraries work. Thanks for taking the time to read this far! Cheers Simon