Hi Marek, How did you input the data and what does your CREATE TABLE/VIEW statement look like? What version of Phoenix and HBase are you using? Thanks, James
On Monday, April 6, 2015, Marek Wiewiorka <[email protected]> wrote: > Hi James - sure here is the result of your query (limited to 5 rows): > > > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ > | TO_CHAR(HU_TS) | HU_HO_ID > | HU_STREAM_ID | > HU_USAGE | > > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ > | 2015-03-19 00:59:59.000 | 4720 > | 0 | 0.287 > | > | 2015-03-19 00:59:59.000 | 6854 > | 0 | 3.6189999999999998 > | > | 2015-03-19 01:59:59.000 | 4720 > | 0 | 0.323 > | > | 2015-03-19 01:59:59.000 | 6854 > | 0 | 2.556 > | > | 2015-03-19 02:59:59.000 | 4720 > | 0 | 0.37 > | > > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ > > Ok - I will try to prepare a sample csv with DDL/DML. > > Thanks again, > Marek > > > 2015-04-06 22:06 GMT+02:00 James Taylor <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>>: > >> Hi Marek, >> Thanks for the additional information. If you could answer my earlier >> questions, that would be helpful. >> >> If you can't repro with a simple test case, then how about attaching a >> csv dump of some of your data (the smallest amount of data that repros >> the issue) to a JIRA along with the CREATE TABLE statement and the >> query? >> >> Thanks, >> James >> >> On Mon, Apr 6, 2015 at 12:49 PM, Marek Wiewiorka >> <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >> > Hi Guys - thanks for your messages. >> > >> > I did another round of testing I found that if I use to_char instead of >> > trunc function I'm getting the expected result: >> > >> > select to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where >> id in >> > (4720,6854) group by to_char(dt,'yyyy-MM-dd'),stream,id; >> > >> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >> > | TO_CHAR(DT) | ID >> > | STREAM | SUM(USAGE) >> > | >> > >> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >> > | 2015-03-19 | 4720 >> > | 0 | 8.405999999999999 >> > | >> > | 2015-03-19 | 6854 >> > | 0 | 28.339000000000006 >> > | >> > >> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >> > >> > the same query with trunc returns 48 rows (24 rows per id) >> > select trunc(dt,'DAY') as day,id,stream,sum(usage) from usage where id >> in >> > (4720,6854) group by trunc(dt,'DAY'),stream,id; >> > e.g.: >> > .... >> > | 2015-03-19 00:00:00.000 | 4720 | 0 >> > | 0.406 | >> > | 2015-03-19 00:00:00.000 | 6854 | 0 >> > | 0.9490000000000001 | >> > | 2015-03-19 00:00:00.000 | 4720 | 0 >> > | 0.332 | >> > | 2015-03-19 00:00:00.000 | 6854 | 0 >> > | 2.738 | >> > | 2015-03-19 00:00:00.000 | 4720 | 0 >> > | 0.33499999999999996 | >> > | 2015-03-19 00:00:00.000 | 6854 | 0 >> > | 2.421 | >> > | 2015-03-19 00:00:00.000 | 4720 | 0 >> > | 0.45599999999999996 | >> > | 2015-03-19 00:00:00.000 | 6854 | 0 >> > | 2.167 | >> > >> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >> > >> > so for some reason grouping by trunc is not working...but using to_char >> is. >> > This applies for table loaded using psql from csv file. >> > >> > When tried to create a sample table and populate it with upserts >> everything >> > worked as expected in both cases: >> > >> > create table usage (dt time not null ,id integer not null,stream >> integer not >> > null, usage double constraint pk PRIMARY KEY(dt,id,stream )); >> > UPSERT INTO usage VALUES('2015-04-01 10:00:00',100,0,2.0); >> > UPSERT INTO usage VALUES('2015-04-01 11:00:00',100,0,3.0); >> > >> > UPSERT INTO usage VALUES('2015-04-01 12:00:00',200,0,4.0); >> > UPSERT INTO usage VALUES('2015-04-01 13:00:00',200,0,6.0); >> > >> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select trunc(dt,'DAY') >> as >> > day,id,stream,sum(usage) from usage where id in (100,200) group by >> > trunc(dt,'DAY'),stream,id; >> > >> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >> > | DAY | ID | >> > STREAM | SUM(USAGE) | >> > >> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >> > | 2015-04-01 00:00:00.000 | 100 | 0 >> > | 5.0 | >> > | 2015-04-01 00:00:00.000 | 200 | 0 >> > | 10.0 | >> > >> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >> > >> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select >> > to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where id in >> > (100,200) group by to_char(dt,'yyyy-MM-dd'),stream,id; >> > >> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >> > | TO_CHAR(DT) | ID >> > | STREAM | SUM(USAGE) >> > | >> > >> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >> > | 2015-04-01 | 100 >> > | 0 | 5.0 >> > | >> > | 2015-04-01 | 200 >> > | 0 | 10.0 >> > | >> > >> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >> > 2 rows selected (1.49 seconds) >> > >> > >> > Shall I open a jira for that? >> > >> > Regards, >> > Marek >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > 2015-04-06 20:16 GMT+02:00 James Taylor <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>>: >> >> >> >> Hi Marek, >> >> How did you input the data and what does your CREATE TABLE/VIEW >> >> statement look like? What version of Phoenix and HBase are you using? >> >> >> >> Also, would you mind running the following query and letting us know >> the >> >> output? >> >> >> >> select to_char(hu_ts,'yyyy-MM-dd >> >> HH:mm:ss.SSS'),hu_ho_id,hu_stream_id,hu_usage >> >> from se_dwh.homes_usage_hour >> >> where hu_ho_id in (4720,6854); >> >> >> >> Thanks, >> >> James >> >> >> >> On Mon, Apr 6, 2015 at 10:34 AM, Gabriel Reid <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> >> >> wrote: >> >> > That certainly looks like a bug. Would it be possible to make a small >> >> > reproducible test case and if possible, log this in the Phoenix JIRA >> >> > (https://issues.apache.org/jira/browse/PHOENIX) ? >> >> > >> >> > - Gabriel >> >> > >> >> > On Mon, Apr 6, 2015 at 6:10 PM Marek Wiewiorka >> >> > <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> >> >> > wrote: >> >> >> >> >> >> Hi All, >> >> >> I came across a weird situation while running a query with group by. >> >> >> I executed 2 queries: >> >> >> >> >> >> 1) >> >> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) from >> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720,6854) group by >> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id; >> >> >> >> >> >> | 2015-03-19 00:00:00.000 | 4720 >> | >> >> >> 0 >> >> >> | 0.45599999999999996 | >> >> >> | 2015-03-19 00:00:00.000 | 6854 >> | >> >> >> 0 >> >> >> | 2.167 | >> >> >> >> >> >> >> >> >> 2) >> >> >> >> >> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) from >> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720) group by >> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id; >> >> >> >> >> >> | 2015-03-19 00:00:00.000 | 4720 >> | >> >> >> 0 >> >> >> | 8.405999999999999 | >> >> >> >> >> >> >> >> >> The only difference is that in the first case I included 2 >> >> >> ids(4720,6854) >> >> >> and in the other one only 1 (4720). >> >> >> The result for hu_ho_id=4720 should be the same in both case but it >> >> >> isn't >> >> >> (0.45 vs 8.4). >> >> >> The second result(8.4) is correct. >> >> >> >> >> >> Am I doing something wrong? >> >> >> >> >> >> Regards, >> >> >> Marek >> >> >> >> >> > >> > >> > >> > >
