Re: group by problem

James Taylor Mon, 06 Apr 2015 13:25:20 -0700

Hi Marek,
How did you input the data and what does your CREATE TABLE/VIEW
statement look like? What version of Phoenix and HBase are you using?
Thanks,
James


On Monday, April 6, 2015, Marek Wiewiorka <[email protected]> wrote:

> Hi James - sure here is the result of your query (limited to 5 rows):
>
>
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> |              TO_CHAR(HU_TS)              |                 HU_HO_ID
>             |               HU_STREAM_ID               |
> HU_USAGE                 |
>
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
> | 2015-03-19 00:59:59.000                  | 4720
>             | 0                                        | 0.287
>                        |
> | 2015-03-19 00:59:59.000                  | 6854
>             | 0                                        | 3.6189999999999998
>                       |
> | 2015-03-19 01:59:59.000                  | 4720
>             | 0                                        | 0.323
>                        |
> | 2015-03-19 01:59:59.000                  | 6854
>             | 0                                        | 2.556
>                        |
> | 2015-03-19 02:59:59.000                  | 4720
>             | 0                                        | 0.37
>                       |
>
> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>
> Ok - I will try to prepare a sample csv with DDL/DML.
>
> Thanks again,
> Marek
>
>
> 2015-04-06 22:06 GMT+02:00 James Taylor <[email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>>:
>
>> Hi Marek,
>> Thanks for the additional information. If you could answer my earlier
>> questions, that would be helpful.
>>
>> If you can't repro with a simple test case, then how about attaching a
>> csv dump of some of your data (the smallest amount of data that repros
>> the issue) to a JIRA along with the CREATE TABLE statement and the
>> query?
>>
>> Thanks,
>> James
>>
>> On Mon, Apr 6, 2015 at 12:49 PM, Marek Wiewiorka
>> <[email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>> > Hi Guys - thanks for your messages.
>> >
>> > I did another round of testing I found that if I use to_char instead of
>> > trunc function I'm getting the expected result:
>> >
>> > select to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where
>> id in
>> > (4720,6854) group by to_char(dt,'yyyy-MM-dd'),stream,id;
>> >
>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> > |               TO_CHAR(DT)                |                    ID
>> > |                  STREAM                  |                SUM(USAGE)
>> > |
>> >
>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> > | 2015-03-19                               | 4720
>> > | 0                                        | 8.405999999999999
>> > |
>> > | 2015-03-19                               | 6854
>> > | 0                                        | 28.339000000000006
>> > |
>> >
>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> >
>> > the same query with trunc returns 48 rows (24 rows per id)
>> > select trunc(dt,'DAY') as day,id,stream,sum(usage) from usage where id
>> in
>> > (4720,6854) group by trunc(dt,'DAY'),stream,id;
>> > e.g.:
>> > ....
>> > | 2015-03-19 00:00:00.000 | 4720                                     | 0
>> > | 0.406                                    |
>> > | 2015-03-19 00:00:00.000 | 6854                                     | 0
>> > | 0.9490000000000001                       |
>> > | 2015-03-19 00:00:00.000 | 4720                                     | 0
>> > | 0.332                                    |
>> > | 2015-03-19 00:00:00.000 | 6854                                     | 0
>> > | 2.738                                    |
>> > | 2015-03-19 00:00:00.000 | 4720                                     | 0
>> > | 0.33499999999999996                      |
>> > | 2015-03-19 00:00:00.000 | 6854                                     | 0
>> > | 2.421                                    |
>> > | 2015-03-19 00:00:00.000 | 4720                                     | 0
>> > | 0.45599999999999996                      |
>> > | 2015-03-19 00:00:00.000 | 6854                                     | 0
>> > | 2.167                                    |
>> >
>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> >
>> > so for some reason grouping by trunc is not working...but using to_char
>> is.
>> > This applies for table loaded using psql from csv file.
>> >
>> > When tried to create a sample table and populate it with upserts
>> everything
>> > worked as expected in both cases:
>> >
>> > create table usage (dt time not null ,id integer not null,stream
>> integer not
>> > null, usage double constraint pk PRIMARY KEY(dt,id,stream ));
>> > UPSERT INTO usage VALUES('2015-04-01 10:00:00',100,0,2.0);
>> > UPSERT INTO usage VALUES('2015-04-01 11:00:00',100,0,3.0);
>> >
>> > UPSERT INTO usage VALUES('2015-04-01 12:00:00',200,0,4.0);
>> > UPSERT INTO usage VALUES('2015-04-01 13:00:00',200,0,6.0);
>> >
>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select trunc(dt,'DAY')
>> as
>> > day,id,stream,sum(usage) from usage where id in (100,200) group by
>> > trunc(dt,'DAY'),stream,id;
>> >
>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> > |           DAY           |                    ID                    |
>> > STREAM                  |                SUM(USAGE)                |
>> >
>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> > | 2015-04-01 00:00:00.000 | 100                                      | 0
>> > | 5.0                                      |
>> > | 2015-04-01 00:00:00.000 | 200                                      | 0
>> > | 10.0                                     |
>> >
>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> >
>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select
>> > to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where id in
>> > (100,200) group by to_char(dt,'yyyy-MM-dd'),stream,id;
>> >
>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> > |               TO_CHAR(DT)                |                    ID
>> > |                  STREAM                  |                SUM(USAGE)
>> > |
>> >
>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> > | 2015-04-01                               | 100
>> > | 0                                        | 5.0
>> > |
>> > | 2015-04-01                               | 200
>> > | 0                                        | 10.0
>> > |
>> >
>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+
>> > 2 rows selected (1.49 seconds)
>> >
>> >
>> > Shall I open a jira for that?
>> >
>> > Regards,
>> > Marek
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > 2015-04-06 20:16 GMT+02:00 James Taylor <[email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>>:
>> >>
>> >> Hi Marek,
>> >> How did you input the data and what does your CREATE TABLE/VIEW
>> >> statement look like? What version of Phoenix and HBase are you using?
>> >>
>> >> Also, would you mind running the following query and letting us know
>> the
>> >> output?
>> >>
>> >> select to_char(hu_ts,'yyyy-MM-dd
>> >> HH:mm:ss.SSS'),hu_ho_id,hu_stream_id,hu_usage
>> >> from se_dwh.homes_usage_hour
>> >> where hu_ho_id in (4720,6854);
>> >>
>> >> Thanks,
>> >> James
>> >>
>> >> On Mon, Apr 6, 2015 at 10:34 AM, Gabriel Reid <[email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>>
>> >> wrote:
>> >> > That certainly looks like a bug. Would it be possible to make a small
>> >> > reproducible test case and if possible, log this in the Phoenix JIRA
>> >> > (https://issues.apache.org/jira/browse/PHOENIX) ?
>> >> >
>> >> > - Gabriel
>> >> >
>> >> > On Mon, Apr 6, 2015 at 6:10 PM Marek Wiewiorka
>> >> > <[email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>>
>> >> > wrote:
>> >> >>
>> >> >> Hi All,
>> >> >> I came across a weird situation while running a query with group by.
>> >> >> I executed 2 queries:
>> >> >>
>> >> >> 1)
>> >> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) from
>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720,6854) group by
>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
>> >> >>
>> >> >> | 2015-03-19 00:00:00.000 | 4720
>>  |
>> >> >> 0
>> >> >> | 0.45599999999999996                      |
>> >> >> | 2015-03-19 00:00:00.000 | 6854
>>  |
>> >> >> 0
>> >> >> | 2.167                                    |
>> >> >>
>> >> >>
>> >> >> 2)
>> >> >>
>> >> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) from
>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720) group by
>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id;
>> >> >>
>> >> >> | 2015-03-19 00:00:00.000 | 4720
>>  |
>> >> >> 0
>> >> >> | 8.405999999999999                        |
>> >> >>
>> >> >>
>> >> >> The only difference is that in the first case I included 2
>> >> >> ids(4720,6854)
>> >> >> and in the other one only 1 (4720).
>> >> >> The result for hu_ho_id=4720 should be the same in both case but it
>> >> >> isn't
>> >> >> (0.45 vs 8.4).
>> >> >> The second result(8.4) is correct.
>> >> >>
>> >> >> Am I doing something wrong?
>> >> >>
>> >> >> Regards,
>> >> >> Marek
>> >> >>
>> >> >
>> >
>> >
>>
>
>

Re: group by problem

Reply via email to