Oh I'm sorry I forgot to attach them: DDL of my table: create table se_dwh.homes_usage_hour (hu_ts time not null ,hu_ho_id integer not null ,hu_stream_id integer not null, hu_usage double constraint pk PRIMARY KEY(hu_ts,hu_ho_id,hu_stream_id) );
Phoenix: 4.3.0 Thanks, Marek 2015-04-06 22:25 GMT+02:00 James Taylor <[email protected]>: > Hi Marek, > How did you input the data and what does your CREATE TABLE/VIEW > statement look like? What version of Phoenix and HBase are you using? > Thanks, > James > > On Monday, April 6, 2015, Marek Wiewiorka <[email protected]> > wrote: > >> Hi James - sure here is the result of your query (limited to 5 rows): >> >> >> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >> | TO_CHAR(HU_TS) | HU_HO_ID >> | HU_STREAM_ID | >> HU_USAGE | >> >> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >> | 2015-03-19 00:59:59.000 | 4720 >> | 0 | 0.287 >> | >> | 2015-03-19 00:59:59.000 | 6854 >> | 0 | 3.6189999999999998 >> | >> | 2015-03-19 01:59:59.000 | 4720 >> | 0 | 0.323 >> | >> | 2015-03-19 01:59:59.000 | 6854 >> | 0 | 2.556 >> | >> | 2015-03-19 02:59:59.000 | 4720 >> | 0 | 0.37 >> | >> >> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >> >> Ok - I will try to prepare a sample csv with DDL/DML. >> >> Thanks again, >> Marek >> >> >> 2015-04-06 22:06 GMT+02:00 James Taylor <[email protected]>: >> >>> Hi Marek, >>> Thanks for the additional information. If you could answer my earlier >>> questions, that would be helpful. >>> >>> If you can't repro with a simple test case, then how about attaching a >>> csv dump of some of your data (the smallest amount of data that repros >>> the issue) to a JIRA along with the CREATE TABLE statement and the >>> query? >>> >>> Thanks, >>> James >>> >>> On Mon, Apr 6, 2015 at 12:49 PM, Marek Wiewiorka >>> <[email protected]> wrote: >>> > Hi Guys - thanks for your messages. >>> > >>> > I did another round of testing I found that if I use to_char instead of >>> > trunc function I'm getting the expected result: >>> > >>> > select to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where >>> id in >>> > (4720,6854) group by to_char(dt,'yyyy-MM-dd'),stream,id; >>> > >>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>> > | TO_CHAR(DT) | ID >>> > | STREAM | SUM(USAGE) >>> > | >>> > >>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>> > | 2015-03-19 | 4720 >>> > | 0 | 8.405999999999999 >>> > | >>> > | 2015-03-19 | 6854 >>> > | 0 | 28.339000000000006 >>> > | >>> > >>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>> > >>> > the same query with trunc returns 48 rows (24 rows per id) >>> > select trunc(dt,'DAY') as day,id,stream,sum(usage) from usage where id >>> in >>> > (4720,6854) group by trunc(dt,'DAY'),stream,id; >>> > e.g.: >>> > .... >>> > | 2015-03-19 00:00:00.000 | 4720 | >>> 0 >>> > | 0.406 | >>> > | 2015-03-19 00:00:00.000 | 6854 | >>> 0 >>> > | 0.9490000000000001 | >>> > | 2015-03-19 00:00:00.000 | 4720 | >>> 0 >>> > | 0.332 | >>> > | 2015-03-19 00:00:00.000 | 6854 | >>> 0 >>> > | 2.738 | >>> > | 2015-03-19 00:00:00.000 | 4720 | >>> 0 >>> > | 0.33499999999999996 | >>> > | 2015-03-19 00:00:00.000 | 6854 | >>> 0 >>> > | 2.421 | >>> > | 2015-03-19 00:00:00.000 | 4720 | >>> 0 >>> > | 0.45599999999999996 | >>> > | 2015-03-19 00:00:00.000 | 6854 | >>> 0 >>> > | 2.167 | >>> > >>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>> > >>> > so for some reason grouping by trunc is not working...but using >>> to_char is. >>> > This applies for table loaded using psql from csv file. >>> > >>> > When tried to create a sample table and populate it with upserts >>> everything >>> > worked as expected in both cases: >>> > >>> > create table usage (dt time not null ,id integer not null,stream >>> integer not >>> > null, usage double constraint pk PRIMARY KEY(dt,id,stream )); >>> > UPSERT INTO usage VALUES('2015-04-01 10:00:00',100,0,2.0); >>> > UPSERT INTO usage VALUES('2015-04-01 11:00:00',100,0,3.0); >>> > >>> > UPSERT INTO usage VALUES('2015-04-01 12:00:00',200,0,4.0); >>> > UPSERT INTO usage VALUES('2015-04-01 13:00:00',200,0,6.0); >>> > >>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select trunc(dt,'DAY') >>> as >>> > day,id,stream,sum(usage) from usage where id in (100,200) group by >>> > trunc(dt,'DAY'),stream,id; >>> > >>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>> > | DAY | ID | >>> > STREAM | SUM(USAGE) | >>> > >>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>> > | 2015-04-01 00:00:00.000 | 100 | >>> 0 >>> > | 5.0 | >>> > | 2015-04-01 00:00:00.000 | 200 | >>> 0 >>> > | 10.0 | >>> > >>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>> > >>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select >>> > to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where id in >>> > (100,200) group by to_char(dt,'yyyy-MM-dd'),stream,id; >>> > >>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>> > | TO_CHAR(DT) | ID >>> > | STREAM | SUM(USAGE) >>> > | >>> > >>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>> > | 2015-04-01 | 100 >>> > | 0 | 5.0 >>> > | >>> > | 2015-04-01 | 200 >>> > | 0 | 10.0 >>> > | >>> > >>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>> > 2 rows selected (1.49 seconds) >>> > >>> > >>> > Shall I open a jira for that? >>> > >>> > Regards, >>> > Marek >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > 2015-04-06 20:16 GMT+02:00 James Taylor <[email protected]>: >>> >> >>> >> Hi Marek, >>> >> How did you input the data and what does your CREATE TABLE/VIEW >>> >> statement look like? What version of Phoenix and HBase are you using? >>> >> >>> >> Also, would you mind running the following query and letting us know >>> the >>> >> output? >>> >> >>> >> select to_char(hu_ts,'yyyy-MM-dd >>> >> HH:mm:ss.SSS'),hu_ho_id,hu_stream_id,hu_usage >>> >> from se_dwh.homes_usage_hour >>> >> where hu_ho_id in (4720,6854); >>> >> >>> >> Thanks, >>> >> James >>> >> >>> >> On Mon, Apr 6, 2015 at 10:34 AM, Gabriel Reid <[email protected] >>> > >>> >> wrote: >>> >> > That certainly looks like a bug. Would it be possible to make a >>> small >>> >> > reproducible test case and if possible, log this in the Phoenix JIRA >>> >> > (https://issues.apache.org/jira/browse/PHOENIX) ? >>> >> > >>> >> > - Gabriel >>> >> > >>> >> > On Mon, Apr 6, 2015 at 6:10 PM Marek Wiewiorka >>> >> > <[email protected]> >>> >> > wrote: >>> >> >> >>> >> >> Hi All, >>> >> >> I came across a weird situation while running a query with group >>> by. >>> >> >> I executed 2 queries: >>> >> >> >>> >> >> 1) >>> >> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) >>> from >>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720,6854) group by >>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id; >>> >> >> >>> >> >> | 2015-03-19 00:00:00.000 | 4720 >>> | >>> >> >> 0 >>> >> >> | 0.45599999999999996 | >>> >> >> | 2015-03-19 00:00:00.000 | 6854 >>> | >>> >> >> 0 >>> >> >> | 2.167 | >>> >> >> >>> >> >> >>> >> >> 2) >>> >> >> >>> >> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) >>> from >>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720) group by >>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id; >>> >> >> >>> >> >> | 2015-03-19 00:00:00.000 | 4720 >>> | >>> >> >> 0 >>> >> >> | 8.405999999999999 | >>> >> >> >>> >> >> >>> >> >> The only difference is that in the first case I included 2 >>> >> >> ids(4720,6854) >>> >> >> and in the other one only 1 (4720). >>> >> >> The result for hu_ho_id=4720 should be the same in both case but it >>> >> >> isn't >>> >> >> (0.45 vs 8.4). >>> >> >> The second result(8.4) is correct. >>> >> >> >>> >> >> Am I doing something wrong? >>> >> >> >>> >> >> Regards, >>> >> >> Marek >>> >> >> >>> >> > >>> > >>> > >>> >> >>
