Im encountering issue with usages of nth_value combined with order by. (reproducing cmd's will be found at the end of the post): Im using phoenix version 4.11 with hbase 1.3.1, on EMR cluster
> select * from mytable; +----------+--------+-------+--------+---------------+--------------+----------+ | COUNTRY | MYKEY | YEAR | MONTH | SITE | MAIN_DOMAIN | MYVALUE | +----------+--------+-------+--------+---------------+--------------+----------+ | 123 | aaaaa | 16 | 2 | ido.com/sdsd | ido.com | 12000.0 | | 123 | aaaaa | 16 | 1 | ido.com/sdsd | ido.com | 1000.0 | | 123 | bbb | 16 | 3 | ido.com/sdsd | ido.com | 10.0 | +----------+--------+-------+--------+---------------+--------------+----------+ > SELECT SUM(myvalue1) AS myvalue2, nth_value(myvalue1, 1) within group > (order by month desc) AS last_myvalue2, nth_value(myvalue1, 2) within > group (order by month desc) AS prev_myvalue2 FROM (SELECT /*+ SKIP SCAN */ > main_domain, site, month, SUM(myvalue) AS myvalue1 FROM mytable GROUP BY > main_domain, site, month) GROUP BY main_domain, site; +-----------+----------------+----------------+ | MYVALUE2 | LAST_MYVALUE2 | PREV_MYVALUE2 | +-----------+----------------+----------------+ | 13010.0 | 10.0 | 12000.0 | +-----------+----------------+----------------+ > SELECT SUM(myvalue1) AS myvalue2, nth_value(myvalue1, 1) within group > (order by month desc) AS last_myvalue2, nth_value(myvalue1, 2) within > group (order by month desc) AS prev_myvalue2 FROM (SELECT /*+ SKIP SCAN */ > main_domain, site, month, SUM(myvalue) AS myvalue1 FROM mytable GROUP BY > main_domain, site, month) GROUP BY main_domain, site *order by myvalue2;* +-----------+----------------+----------------+ | MYVALUE2 | LAST_MYVALUE2 | PREV_MYVALUE2 | +-----------+----------------+----------------+ | 13010.0 | null | null | +-----------+----------------+----------------+ note that the only difference is the order by part. If it worth additional query: > SELECT SUM(myvalue1) AS myvalue2, nth_value(myvalue1, 0) within group > (order by month desc) AS last_myvalue2, nth_value(myvalue1, 1) within > group (order by month desc) AS prev_myvalue2 FROM (SELECT /*+ SKIP SCAN */ > main_domain, site, month, SUM(myvalue) AS myvalue1 FROM mytable GROUP BY > main_domain, site, month) GROUP BY main_domain, site order by myvalue2; +-----------+----------------+----------------+ | MYVALUE2 | LAST_MYVALUE2 | PREV_MYVALUE2 | +-----------+----------------+----------------+ | 13010.0 | 10.0 | null | +-----------+----------------+----------------+ I have also encountered weird close issue with first_value (when used instead of nth_value(...,1)) to reproduce: > CREATE TABLE mytable ( country INTEGER NOT NULL, mykey VARCHAR NOT NULL, > year INTEGER NOT NULL, month INTEGER NOT NULL, cf.site VARCHAR, > cf.main_domain VARCHAR, cf.myvalue DOUBLE CONSTRAINT mykeys_pk PRIMARY KEY > (country, mykey, year, month)) SALT_BUCKETS = 20; > UPSERT INTO mytable (country, mykey, year, month, site, main_domain, > myvalue) values (123, 'aaaaa', 16, 1, 'ido.com/sdsd', 'ido.com', 1000.0); > UPSERT INTO mytable (country, mykey, year, month, site, main_domain, > myvalue) values (123, 'aaaaa', 16, 2, 'ido.com/sdsd', 'ido.com', 12000.0); > UPSERT INTO mytable (country, mykey, year, month, site, main_domain, > myvalue) values (123, 'bbb', 16, 3, 'ido.com/sdsd', 'ido.com', 10.0); > SELECT SUM(myvalue1) AS myvalue2, nth_value(myvalue1, 1) within group (order by month desc) AS last_myvalue2, nth_value(myvalue1,2) within group (order by month desc) AS prev_myvalue2 FROM (SELECT /*+ SKIP SCAN */ main_domain, site, month, SUM(myvalue) AS myvalue1 FROM mytable GROUP BY main_domain, site, month) GROUP BY main_domain, site; - I have tried to remove salt - it hasn't fixed the issue -- Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/
