Hi

Thanks for the reply, I posted here by mistake and wasn't sure how to delete.  
It's indeed a problem with phoenix

Sorry to waste your time

Cheers

S




________________________________
From: Josh Elser <[email protected]>
Sent: Saturday, 9 October 2021 3:25 am
To: [email protected]
Subject: Re: Major problem for us with Phoenix joins with certain aggregations

That error sounds like a bug in Phoenix.

Maybe you could try with a newer version of Phoenix? Asking over on
user@phoenix might net a better result.

On 9/27/21 11:47 PM, Simon Mottram wrote:
> Forgot to mention this is only an issue for LAST_VALUE (so far!)
>
> This works fine
>
>   SELECT
> "BIOMATERIAL_NAME",
> AVG("PLANT_FRUIT_COUNT")
> FROM
> VARIABLE_VALUES_QA.OBSERVATION
> WHERE
> EXISTS (
> SELECT
> DOCID
> FROM
> VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX
> WHERE
> DOCID = OBSERVATION_VALUE_ID
> AND TAGNAME = 'TRIAL_ID'
> AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe')
> GROUP BY
> "BIOMATERIAL_NAME"
> LIMIT 10
>   OFFSET 0;
> ________________________________
> From: Simon Mottram <[email protected]>
> Sent: 28 September 2021 4:34 PM
> To: [email protected] <[email protected]>
> Subject: Major problem for us with Phoenix joins with certain aggregations
>
> Hi
>
> Got my fingers crossed that there's a work around for this as this really is 
> a big problem for us
>
> We are using:
>
> Amazon EMR
>
> Release label:emr-6.1.0
> Hadoop distribution:Amazon
> Applications:Hbase 2.2.5, Hive 3.1.2, Phoenix 5.0.0, Pig 0.17.0
>
> Thin Client version:
> phoenix-5.0.0-HBase-2.0-thin-client.jar
>
> We get the following error when doing an aggregation where
>
>    1.  A JOIN is empty
>    2.  The column is INTEGER or DATETIME
>
> Remote driver error: IllegalArgumentException: offset (25) + length (4) 
> exceed the capacity of the array: 25
>
> The query that breaks is:
>
> SELECT
> "BIOMATERIAL_NAME",
> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
> FROM
> VARIABLE_VALUES_QA.OBSERVATION
> JOIN VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX
> ON DOCID = OBSERVATION_VALUE_ID
>    AND TAGNAME = 'TRIAL_ID'
> AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe'
> GROUP BY
> "BIOMATERIAL_NAME"
> LIMIT 10
>   OFFSET 0;
>
> I can refactor this using EXIST but get same error, presumably the driver 
> knows to treat them the same:
>
> SELECT
> "BIOMATERIAL_NAME",
> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
> FROM
> VARIABLE_VALUES_QA.OBSERVATION
> WHERE
> EXISTS (
> SELECT
> DOCID
> FROM
> VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX
> WHERE
> DOCID = OBSERVATION_VALUE_ID
> AND TAGNAME = 'TRIAL_ID'
> AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe')
> GROUP BY
> "BIOMATERIAL_NAME"
> LIMIT 10
>   OFFSET 0;
>
> If we remove the external reference we get no error, regardless of whether 
> there are any hits or not
>
> -- these all work
> There are no hits for this query
>
> SELECT
> "BIOMATERIAL_NAME",
> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
> FROM
> VARIABLE_VALUES_QA.OBSERVATION
> WHERE
> BIOMATERIAL_TYPE = 'aardvark'
> GROUP BY
> "BIOMATERIAL_NAME"
> LIMIT 10
>   OFFSET 0;
>
> Lots of hits for this query:
>
> SELECT
> "BIOMATERIAL_NAME",
> FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
> ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
> FROM
> VARIABLE_VALUES_QA.OBSERVATION
> GROUP BY
> "BIOMATERIAL_NAME"
> LIMIT 10
>   OFFSET 0;
>
>
>
>
>

Reply via email to