That error sounds like a bug in Phoenix.

Maybe you could try with a newer version of Phoenix? Asking over on user@phoenix might net a better result.

On 9/27/21 11:47 PM, Simon Mottram wrote:
Forgot to mention this is only an issue for LAST_VALUE (so far!)

This works fine

  SELECT
"BIOMATERIAL_NAME",
AVG("PLANT_FRUIT_COUNT")
FROM
VARIABLE_VALUES_QA.OBSERVATION
WHERE
EXISTS (
SELECT
DOCID
FROM
VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX
WHERE
DOCID = OBSERVATION_VALUE_ID
AND TAGNAME = 'TRIAL_ID'
AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe')
GROUP BY
"BIOMATERIAL_NAME"
LIMIT 10
  OFFSET 0;
________________________________
From: Simon Mottram <simon.mott...@cucumber.co.nz>
Sent: 28 September 2021 4:34 PM
To: user@hbase.apache.org <user@hbase.apache.org>
Subject: Major problem for us with Phoenix joins with certain aggregations

Hi

Got my fingers crossed that there's a work around for this as this really is a 
big problem for us

We are using:

Amazon EMR

Release label:emr-6.1.0
Hadoop distribution:Amazon
Applications:Hbase 2.2.5, Hive 3.1.2, Phoenix 5.0.0, Pig 0.17.0

Thin Client version:
phoenix-5.0.0-HBase-2.0-thin-client.jar

We get the following error when doing an aggregation where

   1.  A JOIN is empty
   2.  The column is INTEGER or DATETIME

Remote driver error: IllegalArgumentException: offset (25) + length (4) exceed 
the capacity of the array: 25

The query that breaks is:

SELECT
"BIOMATERIAL_NAME",
FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
FROM
VARIABLE_VALUES_QA.OBSERVATION
JOIN VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX
ON DOCID = OBSERVATION_VALUE_ID
   AND TAGNAME = 'TRIAL_ID'
AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe'
GROUP BY
"BIOMATERIAL_NAME"
LIMIT 10
  OFFSET 0;

I can refactor this using EXIST but get same error, presumably the driver knows 
to treat them the same:

SELECT
"BIOMATERIAL_NAME",
FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
FROM
VARIABLE_VALUES_QA.OBSERVATION
WHERE
EXISTS (
SELECT
DOCID
FROM
VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX
WHERE
DOCID = OBSERVATION_VALUE_ID
AND TAGNAME = 'TRIAL_ID'
AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe')
GROUP BY
"BIOMATERIAL_NAME"
LIMIT 10
  OFFSET 0;

If we remove the external reference we get no error, regardless of whether 
there are any hits or not

-- these all work
There are no hits for this query

SELECT
"BIOMATERIAL_NAME",
FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
FROM
VARIABLE_VALUES_QA.OBSERVATION
WHERE
BIOMATERIAL_TYPE = 'aardvark'
GROUP BY
"BIOMATERIAL_NAME"
LIMIT 10
  OFFSET 0;

Lots of hits for this query:

SELECT
"BIOMATERIAL_NAME",
FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
FROM
VARIABLE_VALUES_QA.OBSERVATION
GROUP BY
"BIOMATERIAL_NAME"
LIMIT 10
  OFFSET 0;





Reply via email to