Hi Thanks for the reply, I posted here by mistake and wasn't sure how to delete. It's indeed a problem with phoenix
Sorry to waste your time Cheers S ________________________________ From: Josh Elser <[email protected]> Sent: Saturday, 9 October 2021 3:25 am To: [email protected] Subject: Re: Major problem for us with Phoenix joins with certain aggregations That error sounds like a bug in Phoenix. Maybe you could try with a newer version of Phoenix? Asking over on user@phoenix might net a better result. On 9/27/21 11:47 PM, Simon Mottram wrote: > Forgot to mention this is only an issue for LAST_VALUE (so far!) > > This works fine > > SELECT > "BIOMATERIAL_NAME", > AVG("PLANT_FRUIT_COUNT") > FROM > VARIABLE_VALUES_QA.OBSERVATION > WHERE > EXISTS ( > SELECT > DOCID > FROM > VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX > WHERE > DOCID = OBSERVATION_VALUE_ID > AND TAGNAME = 'TRIAL_ID' > AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe') > GROUP BY > "BIOMATERIAL_NAME" > LIMIT 10 > OFFSET 0; > ________________________________ > From: Simon Mottram <[email protected]> > Sent: 28 September 2021 4:34 PM > To: [email protected] <[email protected]> > Subject: Major problem for us with Phoenix joins with certain aggregations > > Hi > > Got my fingers crossed that there's a work around for this as this really is > a big problem for us > > We are using: > > Amazon EMR > > Release label:emr-6.1.0 > Hadoop distribution:Amazon > Applications:Hbase 2.2.5, Hive 3.1.2, Phoenix 5.0.0, Pig 0.17.0 > > Thin Client version: > phoenix-5.0.0-HBase-2.0-thin-client.jar > > We get the following error when doing an aggregation where > > 1. A JOIN is empty > 2. The column is INTEGER or DATETIME > > Remote driver error: IllegalArgumentException: offset (25) + length (4) > exceed the capacity of the array: 25 > > The query that breaks is: > > SELECT > "BIOMATERIAL_NAME", > FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP ( > ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count" > FROM > VARIABLE_VALUES_QA.OBSERVATION > JOIN VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX > ON DOCID = OBSERVATION_VALUE_ID > AND TAGNAME = 'TRIAL_ID' > AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe' > GROUP BY > "BIOMATERIAL_NAME" > LIMIT 10 > OFFSET 0; > > I can refactor this using EXIST but get same error, presumably the driver > knows to treat them the same: > > SELECT > "BIOMATERIAL_NAME", > FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP ( > ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count" > FROM > VARIABLE_VALUES_QA.OBSERVATION > WHERE > EXISTS ( > SELECT > DOCID > FROM > VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX > WHERE > DOCID = OBSERVATION_VALUE_ID > AND TAGNAME = 'TRIAL_ID' > AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe') > GROUP BY > "BIOMATERIAL_NAME" > LIMIT 10 > OFFSET 0; > > If we remove the external reference we get no error, regardless of whether > there are any hits or not > > -- these all work > There are no hits for this query > > SELECT > "BIOMATERIAL_NAME", > FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP ( > ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count" > FROM > VARIABLE_VALUES_QA.OBSERVATION > WHERE > BIOMATERIAL_TYPE = 'aardvark' > GROUP BY > "BIOMATERIAL_NAME" > LIMIT 10 > OFFSET 0; > > Lots of hits for this query: > > SELECT > "BIOMATERIAL_NAME", > FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP ( > ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count" > FROM > VARIABLE_VALUES_QA.OBSERVATION > GROUP BY > "BIOMATERIAL_NAME" > LIMIT 10 > OFFSET 0; > > > > >
