Thanks.
Looks like using this query
select to_char(trunc(date - 8.0/24.0,'DAY') , 'yyyy-MM-dd') as day,
count(1) from events where date > to_date('2013-02-01 PST','yyyy-MM-dd Z')
group by day
will achieve aggregation by day in pst zone.
On Tue, Feb 25, 2014 at 2:44 PM, James Taylor <[email protected]>wrote:
> Run this if you want to see the difference between doing and not doing
> TRUNC:
> select to_char(trunc(date,'DAY'),'yyyy-MM-dd hh'),
> to_char(date,'yyyy-MM-dd hh') from bi.events limit 1;
>
> Sqlline is formatting the date if you don't do a TO_CHAR on it, not
> Phoenix. I still don't understand why it matters how it's printed out. The
> underlying date is not changing.
>
>
> On Tue, Feb 25, 2014 at 1:15 PM, Sean Huo <[email protected]> wrote:
>
>> here is what i get
>>
>> select trunc(date,'DAY'), to_char(date,'yyyy-MM-dd hh'),date from
>> bi.events limit 1;
>>
>> *+----------------------+---------------+--------------+*
>>
>> *| **FLOOR(TO_DATE(DATE))** | **TO_CHAR(DATE)** | ** DATE ** |*
>>
>> *+----------------------+---------------+--------------+*
>>
>> *| *2013-07-14 * | *2013-07-15 07* | *2013-07-15 00:01:02.346*
>> |*
>>
>> *+----------------------+---------------+--------------+*
>>
>>
>> looks like to_char is formatting using gmt, where date is in pst.
>>
>>
>> On Tue, Feb 25, 2014 at 1:10 PM, James Taylor <[email protected]>wrote:
>>
>>> I'm not sure how sqlline prints date fields by default. Can you do a
>>> TO_CHAR(date) with a format string instead for both dates?
>>>
>>> The TRUNC function just truncates the date to a day boundary when you
>>> specify the 'DAY' argument.
>>>
>>>
>>> On Tue, Feb 25, 2014 at 12:59 PM, Sean Huo <[email protected]> wrote:
>>>
>>>> Hi James,
>>>>
>>>> I still don't understand the semantics of trunc function.
>>>>
>>>> Here is what is displayed on sqlline
>>>>
>>>> select trunc(date,'DAY'), date from events limit 1;
>>>>
>>>> *+----------------------+--------------+*
>>>>
>>>> *| **FLOOR(TO_DATE(DATE))** | ** DATE ** |*
>>>>
>>>> *+----------------------+--------------+*
>>>>
>>>> *| *2013-07-14 * | *2013-07-15 00:01:02.346* |*
>>>>
>>>> *+----------------------+--------------+*
>>>>
>>>> Can you tell me why it returns '2013-07-14' rather than '2013-07-15'
>>>>
>>>> Thanks
>>>> Sean
>>>>
>>>>
>>>> On Tue, Feb 25, 2014 at 12:18 PM, James Taylor
>>>> <[email protected]>wrote:
>>>>
>>>>> You'll get much better performance using the TRUNC function. See
>>>>> org.apache.phoenix.end2end.ProductMetricTest for some examples.
>>>>>
>>>>> Thanks,
>>>>> James
>>>>>
>>>>>
>>>>> On Tue, Feb 25, 2014 at 12:11 PM, Sean Huo <[email protected]>wrote:
>>>>>
>>>>>> My usecase is simple. I have a event table that has timestamp as part
>>>>>> of the key. I want to do a event count per day
>>>>>>
>>>>>> I could do this in phoenix
>>>>>>
>>>>>> select to_char(ts,'yyyy-MM-dd') day, count(1) from events group by
>>>>>> day;
>>>>>>
>>>>>> it returns event count per day per GMT time.
>>>>>>
>>>>>> Now if I am interested in doing a event count per PST timezone,
>>>>>>
>>>>>> That query doesn't work any more.
>>>>>>
>>>>>> The round about way to do this is to execute a query per day like this
>>>>>>
>>>>>> select count(1) from events where ts between to_date('2014-02-20
>>>>>> PST','yyyy-MM-dd Z') and to_date('2014-02-21 13 PST','yyyy-MM-dd Z')
>>>>>>
>>>>>> I will look into trunc function. There is not much documentation and
>>>>>> usage on the function.
>>>>>>
>>>>>> Thanks
>>>>>> Sean
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 25, 2014 at 11:53 AM, James Taylor <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Would it be possible to get a bit more info on your use case?
>>>>>>> Usually showing a date/time using a different timezone is a end-user
>>>>>>> display issue. How does this impact your group by? Grouping by a
>>>>>>> date/time
>>>>>>> will be the same regardless of the timezone you use to format your date.
>>>>>>>
>>>>>>> Do you know about our TRUNC and ROUND functions?
>>>>>>> http://phoenix.incubator.apache.org/language/functions.html#/truncate
>>>>>>>
>>>>>>> This is typically a good way to "bucketize" a date when you do a
>>>>>>> group by, like this:
>>>>>>> SELECT count(*) FROM t GROUP BY TRUNC(my_date,'DAY')
>>>>>>>
>>>>>>> You can use date arithmetic if you wanted to "shift" all the dates
>>>>>>> based on a timezone offset, like this (shifting 8 hours forward):
>>>>>>> SELECT count(*) FROM t GROUP BY TRUNC(my_date + 8.0/24.0,'DAY')
>>>>>>>
>>>>>>> HTH,
>>>>>>> James
>>>>>>>
>>>>>>> On Tue, Feb 25, 2014 at 11:41 AM, Sean Huo <[email protected]>wrote:
>>>>>>>
>>>>>>>> Well, I can not use to_date function since it expects a string
>>>>>>>> input while I have a timestamp.
>>>>>>>> Also doing is in java is not a solution since I want to do a group
>>>>>>>> by on the timestamp in a customized timezone.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Feb 25, 2014 at 11:25 AM, James Taylor <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Have you tried using the TO_DATE in conjunction with the TO_CHAR,
>>>>>>>>> where you specify a different timezone in the TO_DATE format_arg?
>>>>>>>>>
>>>>>>>>> Another option is to do this in Java. When you do a
>>>>>>>>> resultSet.getDate("MY_DATE_COL"), you can do whatever you want with
>>>>>>>>> the
>>>>>>>>> Date you get back.
>>>>>>>>>
>>>>>>>>> We're definitely open to taking contributions for new built-in
>>>>>>>>> functions. They're pretty easy to add. Just follow this guide:
>>>>>>>>> http://phoenix-hbase.blogspot.com/2013/04/how-to-add-your-own-built-in-function.html
>>>>>>>>>
>>>>>>>>> Adding more date manipulation functions would be much appreciated.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> James
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Feb 25, 2014 at 11:16 AM, Sean Huo
>>>>>>>>> <[email protected]>wrote:
>>>>>>>>>
>>>>>>>>>> Well, to be frankly, the example on the to_char udf is wrong
>>>>>>>>>>
>>>>>>>>>> TO_CHAR(myDate, '2001-02-03 04:05:06')
>>>>>>>>>> does not produce the right result and is misleading.
>>>>>>>>>>
>>>>>>>>>> This function does not give one the ability to format the date in
>>>>>>>>>> a customized timezone.
>>>>>>>>>> ALl it does is to allow timezone to be included in the output,
>>>>>>>>>> but it is is always GMT.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Feb 25, 2014 at 11:04 AM, James Taylor <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://phoenix.incubator.apache.org/language/functions.html#/to_charwith
>>>>>>>>>>> a formatString argument.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Feb 25, 2014 at 10:58 AM, Sean Huo <[email protected]
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> It seems that to_char udf always produces timestamp/date string
>>>>>>>>>>>> in GMT.
>>>>>>>>>>>> Is there a function that allows users to pass in a timezone
>>>>>>>>>>>> string so
>>>>>>>>>>>> that timestamp can be displayed accordingly?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>>
>>>>>>>>>>>> Sean
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>