No worries. Thanks for confirming!

On 10/10/21 1:43 PM, Simon Mottram wrote:
Hi

Thanks for the reply, I posted here by mistake and wasn't sure how to delete.  
It's indeed a problem with phoenix

Sorry to waste your time

Cheers

S




________________________________
From: Josh Elser <els...@apache.org>
Sent: Saturday, 9 October 2021 3:25 am
To: user@hbase.apache.org
Subject: Re: Major problem for us with Phoenix joins with certain aggregations

That error sounds like a bug in Phoenix.

Maybe you could try with a newer version of Phoenix? Asking over on
user@phoenix might net a better result.

On 9/27/21 11:47 PM, Simon Mottram wrote:
Forgot to mention this is only an issue for LAST_VALUE (so far!)

This works fine

   SELECT
"BIOMATERIAL_NAME",
AVG("PLANT_FRUIT_COUNT")
FROM
VARIABLE_VALUES_QA.OBSERVATION
WHERE
EXISTS (
SELECT
DOCID
FROM
VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX
WHERE
DOCID = OBSERVATION_VALUE_ID
AND TAGNAME = 'TRIAL_ID'
AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe')
GROUP BY
"BIOMATERIAL_NAME"
LIMIT 10
   OFFSET 0;
________________________________
From: Simon Mottram <simon.mott...@cucumber.co.nz>
Sent: 28 September 2021 4:34 PM
To: user@hbase.apache.org <user@hbase.apache.org>
Subject: Major problem for us with Phoenix joins with certain aggregations

Hi

Got my fingers crossed that there's a work around for this as this really is a 
big problem for us

We are using:

Amazon EMR

Release label:emr-6.1.0
Hadoop distribution:Amazon
Applications:Hbase 2.2.5, Hive 3.1.2, Phoenix 5.0.0, Pig 0.17.0

Thin Client version:
phoenix-5.0.0-HBase-2.0-thin-client.jar

We get the following error when doing an aggregation where

    1.  A JOIN is empty
    2.  The column is INTEGER or DATETIME

Remote driver error: IllegalArgumentException: offset (25) + length (4) exceed 
the capacity of the array: 25

The query that breaks is:

SELECT
"BIOMATERIAL_NAME",
FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
FROM
VARIABLE_VALUES_QA.OBSERVATION
JOIN VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX
ON DOCID = OBSERVATION_VALUE_ID
    AND TAGNAME = 'TRIAL_ID'
AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe'
GROUP BY
"BIOMATERIAL_NAME"
LIMIT 10
   OFFSET 0;

I can refactor this using EXIST but get same error, presumably the driver knows 
to treat them the same:

SELECT
"BIOMATERIAL_NAME",
FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
FROM
VARIABLE_VALUES_QA.OBSERVATION
WHERE
EXISTS (
SELECT
DOCID
FROM
VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX
WHERE
DOCID = OBSERVATION_VALUE_ID
AND TAGNAME = 'TRIAL_ID'
AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe')
GROUP BY
"BIOMATERIAL_NAME"
LIMIT 10
   OFFSET 0;

If we remove the external reference we get no error, regardless of whether 
there are any hits or not

-- these all work
There are no hits for this query

SELECT
"BIOMATERIAL_NAME",
FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
FROM
VARIABLE_VALUES_QA.OBSERVATION
WHERE
BIOMATERIAL_TYPE = 'aardvark'
GROUP BY
"BIOMATERIAL_NAME"
LIMIT 10
   OFFSET 0;

Lots of hits for this query:

SELECT
"BIOMATERIAL_NAME",
FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
FROM
VARIABLE_VALUES_QA.OBSERVATION
GROUP BY
"BIOMATERIAL_NAME"
LIMIT 10
   OFFSET 0;






Reply via email to