Re: group by problem

Marek Wiewiorka Mon, 06 Apr 2015 12:51:43 -0700

Hi Guys - thanks for your messages.

I did another round of testing I found that if I use to_char instead of
trunc function I'm getting the expected result:


select to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where id in
(4720,6854) group by to_char(dt,'yyyy-MM-dd'),stream,id;
+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
|               TO_CHAR(DT)                |                    ID
           |                  STREAM                  |
 SUM(USAGE)                |
+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
| 2015-03-19                               | 4720
          | 0                                        | 8.405999999999999
                     |
| 2015-03-19                               | 6854
          | 0                                        | 28.339000000000006
                    |
+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+

the same query with trunc returns 48 rows (24 rows per id)
select trunc(dt,'DAY') as day,id,stream,sum(usage) from usage where id in
(4720,6854) group by trunc(dt,'DAY'),stream,id;
e.g.:
....
| 2015-03-19 00:00:00.000 | 4720                                     | 0
                                     | 0.406
     |
| 2015-03-19 00:00:00.000 | 6854                                     | 0
                                     | 0.9490000000000001
    |
| 2015-03-19 00:00:00.000 | 4720                                     | 0
                                     | 0.332
     |
| 2015-03-19 00:00:00.000 | 6854                                     | 0
                                     | 2.738
     |
| 2015-03-19 00:00:00.000 | 4720                                     | 0
                                     | 0.33499999999999996
     |
| 2015-03-19 00:00:00.000 | 6854                                     | 0
                                     | 2.421
     |
| 2015-03-19 00:00:00.000 | 4720                                     | 0
                                     | 0.45599999999999996
     |
| 2015-03-19 00:00:00.000 | 6854                                     | 0
                                     | 2.167
     |
+-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+

so for some reason grouping by trunc is not working...but using to_char is.
This applies for table loaded using psql from csv file.

When tried to create a sample table and populate it with upserts everything
worked as expected in both cases:

create table usage (dt time not null ,id integer not null,stream integer
not null, usage double constraint pk PRIMARY KEY(dt,id,stream ));
UPSERT INTO usage VALUES('2015-04-01 10:00:00',100,0,2.0);
UPSERT INTO usage VALUES('2015-04-01 11:00:00',100,0,3.0);

UPSERT INTO usage VALUES('2015-04-01 12:00:00',200,0,4.0);
UPSERT INTO usage VALUES('2015-04-01 13:00:00',200,0,6.0);

0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select trunc(dt,'DAY') as
day,id,stream,sum(usage) from usage where id in (100,200) group by
trunc(dt,'DAY'),stream,id;
+-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
|           DAY           |                    ID                    |
             STREAM                  |                SUM(USAGE)
     |
+-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
| 2015-04-01 00:00:00.000 | 100                                      | 0
                                     | 5.0
     |
| 2015-04-01 00:00:00.000 | 200                                      | 0
                                     | 10.0
    |
+-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+

0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select
to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where id in
(100,200) group by to_char(dt,'yyyy-MM-dd'),stream,id;
+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
|               TO_CHAR(DT)                |                    ID
           |                  STREAM                  |
 SUM(USAGE)                |
+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
| 2015-04-01                               | 100
           | 0                                        | 5.0
                     |
| 2015-04-01                               | 200
           | 0                                        | 10.0
                      |
+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
2 rows selected (1.49 seconds)


Shall I open a jira for that?

Regards,
Marek











2015-04-06 20:16 GMT+02:00 James Taylor <[email protected]>:

> Hi Marek,
> How did you input the data and what does your CREATE TABLE/VIEW
> statement look like? What version of Phoenix and HBase are you using?
>
> Also, would you mind running the following query and letting us know the
> output?
>
> select to_char(hu_ts,'yyyy-MM-dd
> HH:mm:ss.SSS'),hu_ho_id,hu_stream_id,hu_usage
> from se_dwh.homes_usage_hour
> where hu_ho_id in (4720,6854);
>
> Thanks,
> James
>
> On Mon, Apr 6, 2015 at 10:34 AM, Gabriel Reid <[email protected]>
> wrote:
> > That certainly looks like a bug. Would it be possible to make a small
> > reproducible test case and if possible, log this in the Phoenix JIRA
> > (https://issues.apache.org/jira/browse/PHOENIX) ?
> >
> > - Gabriel
> >
> > On Mon, Apr 6, 2015 at 6:10 PM Marek Wiewiorka <
> [email protected]>
> > wrote:
> >>
> >> Hi All,
> >> I came across a weird situation while running a query with group by.
> >> I executed 2 queries:
> >>
> >> 1)
> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) from
> >> se_dwh.homes_usage_hour where hu_ho_id in (4720,6854) group by
> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
> >>
> >> | 2015-03-19 00:00:00.000 | 4720                                     | 0
> >> | 0.45599999999999996                      |
> >> | 2015-03-19 00:00:00.000 | 6854                                     | 0
> >> | 2.167                                    |
> >>
> >>
> >> 2)
> >>
> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) from
> >> se_dwh.homes_usage_hour where hu_ho_id in (4720) group by
> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
> >>
> >> | 2015-03-19 00:00:00.000 | 4720                                     | 0
> >> | 8.405999999999999                        |
> >>
> >>
> >> The only difference is that in the first case I included 2
> ids(4720,6854)
> >> and in the other one only 1 (4720).
> >> The result for hu_ho_id=4720 should be the same in both case but it
> isn't
> >> (0.45 vs 8.4).
> >> The second result(8.4) is correct.
> >>
> >> Am I doing something wrong?
> >>
> >> Regards,
> >> Marek
> >>
> >
>

Re: group by problem

Reply via email to