Hi,

I would like to add something in this partition discussion:

I found out that Lens doesn't allow me to use the "update fact" command to
change the partition update periods. For example, i have the "<update_period
>DAILY</update_period>" tag in one of my fact table definition and i now
decide to have the
MONTHLY partition too, but i couldn't update my fact table to accept that.
Even after i dropped the entire fact table
and tried to create it again with both "DAILY" and "MONTHLY" as the update
periods i still couldn't add the MONTHLY partition.
I think some of the meta data still got hanging around and not being
deleted after i dropped the table.
Is this a bug at Lens?


Thanks,
Dayou Jiang

On Wed, Aug 10, 2016 at 11:36 AM, Tao Yan <t...@linkedin.com> wrote:

> Thanks for answering the question! It is helpful.
>
> On Wed, Aug 10, 2016 at 1:15 AM, amareshwarisr . <amareshw...@gmail.com>
> wrote:
>
>> We have a mapping time dimension columns to its partition columns,
>> because system's like HIVE would require partition column to be separate
>> column than table columns.
>>
>> So, if tables already contain order_time as a column, and the data is
>> partitioned by order_time - then partition column name has to be a separate
>> column - which is defined through the mapping.
>>
>> Thanks
>>
>> On Wed, Aug 10, 2016 at 1:40 PM, Rajat Khandelwal <pro...@apache.org>
>> wrote:
>>
>>> "value" is required in a "property" tag. A property tag is name and
>>> value. This special property defines that whenever queried on order_time
>>> time dimension, try to answer with facts that have "ot" as
>>> partition column. If not, it falls back to delivery_time time dimension.
>>> There again, the mapping is checked, which is "dt", so facts that have "dt"
>>> as partition column are preferred. This can happen in a chain until a valid
>>> combination is found which can answer the query.
>>>
>>>
>>> On Wed, Aug 10, 2016 at 12:13 AM Tao Yan <t...@linkedin.com> wrote:
>>>
>>>> So, the field 'value' can be used to define both alias and real values?
>>>>
>>>> On Tue, Aug 9, 2016 at 11:40 AM, Rajat Khandelwal <pro...@apache.org>
>>>> wrote:
>>>>
>>>>> Yes, that is correct.
>>>>>
>>>>> On Wed, Aug 10, 2016, 00:09 Tao Yan <t...@linkedin.com> wrote:
>>>>>
>>>>>> Thanks for the explanation. I got the idea of tentative partitions.
>>>>>>
>>>>>> Is the property
>>>>>> *<property name="cube.timedim.partition.order_time" value="ot" />*
>>>>>>  referring to the dim_attribute
>>>>>> *<dim_attribute name="order_time" _type="TIMESTAMP" /> *?
>>>>>>
>>>>>> Is the value 'ot' an alias of the partition column name?
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 9, 2016 at 12:47 AM, Rajat Khandelwal <pro...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> The idea of partitions is that there is a stream of data that keeps
>>>>>>> flowing in, each line of which has delivery time and order time both. 
>>>>>>> The
>>>>>>> data is partitioned and stored in partitions later and facts are 
>>>>>>> created on
>>>>>>> top of that partitioned data. So if data is partitioned in order time 
>>>>>>> and
>>>>>>> the time range queried is also order time, then we can pick the 
>>>>>>> partitions
>>>>>>> directly. But if the data is partitioned on delivery time, as is the 
>>>>>>> case
>>>>>>> for this fact, and the query is on order time, then you pick some 
>>>>>>> tentative
>>>>>>> partitions of delivery time, but you still have to filter over order 
>>>>>>> time.
>>>>>>> Here the purpose of picking partitions is to reduce the amount of data
>>>>>>> read. When you pick delivery time partitions, it doesn't guarantee what
>>>>>>> order times data has, it only provides a tentative range. Hence, the
>>>>>>> partitions are picked based on the relation of time dimensions, but the
>>>>>>> extra filter still has to be supplied.
>>>>>>>
>>>>>>> Hope it's clear.
>>>>>>>
>>>>>>> On Tue, Aug 9, 2016 at 10:30 AM Rajat Khandelwal <pro...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> That's the original time range asked in the cube query.
>>>>>>>>
>>>>>>>> On Tue, Aug 9, 2016, 01:46 Tao Yan <t...@linkedin.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Rajat,
>>>>>>>>>
>>>>>>>>> Thanks for the explain. I have a clear understanding about dim
>>>>>>>>> relation now, however, how does the following condition generated:
>>>>>>>>>
>>>>>>>>> and ((sales.order_time) >= '2015-04-13 03:00:00') and 
>>>>>>>>> ((sales.order_time) < '2015-04-13 04:00:00')))
>>>>>>>>>
>>>>>>>>> order_time is a dim attributes defined in sales cube, and it does not 
>>>>>>>>> have any relationships with properties or other dim attributes (The 
>>>>>>>>> property cube.timedim.partition.order_time does, but is it the same 
>>>>>>>>> thing?), I wonder how the query could pick up this column.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Aug 8, 2016 at 4:12 AM, Rajat Khandelwal <
>>>>>>>>> pro...@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> I've created a jira for this: https://issues.apache.or
>>>>>>>>>> g/jira/browse/LENS-1269
>>>>>>>>>>
>>>>>>>>>> You should be able to make change in your data locally and see
>>>>>>>>>> the non-empty results:
>>>>>>>>>>
>>>>>>>>>> cat /Users/rajat.khandelwal/Git/lens/lens-dist/target/apache-len
>>>>>>>>>> s-2.7.0-SNAPSHOT-bin/apache-lens-2.7.0-SNAPSHOT-bin/client/
>>>>>>>>>> examples/data/sales_aggr_fact2_local3/sales-aggr-fact2-
>>>>>>>>>> local3.data
>>>>>>>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,1,1,1,1,1,1,5,0
>>>>>>>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,2,1,2,2,2,1,8,2
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Aug 8, 2016 at 2:46 PM Rajat Khandelwal <
>>>>>>>>>> pro...@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> More details on how it converts to the fallback range:
>>>>>>>>>>>
>>>>>>>>>>> Range asked:
>>>>>>>>>>> (order_time, '2015-04-13-03', '2015-04-13-04')
>>>>>>>>>>>
>>>>>>>>>>> Timedim Relation :
>>>>>>>>>>>
>>>>>>>>>>> <property name="cube.timedim.relation.order_time" 
>>>>>>>>>>> value="delivery_time+[-20 days,-1 hour]" />
>>>>>>>>>>>
>>>>>>>>>>> <property name="cube.timedim.relation.order_time" 
>>>>>>>>>>> value="delivery_time+[-2 hours,-1hour]" />
>>>>>>>>>>>
>>>>>>>>>>> The cube defines the first relation, the fact defines the second 
>>>>>>>>>>> relation. A fact can optionally override the timedim relation 
>>>>>>>>>>> defined by cube. So the second one is picked :
>>>>>>>>>>>
>>>>>>>>>>> Derivation:
>>>>>>>>>>>
>>>>>>>>>>> 2015-04-13-03 <=ot < 2015-04-13-04
>>>>>>>>>>> dt - 2 hours <= ot <= dt - 1 hour => ot + 1 hour <= dt <= ot + 2
>>>>>>>>>>> hour
>>>>>>>>>>> => 2015-04-13-04 <= dt < 2015-04-13-06.
>>>>>>>>>>>
>>>>>>>>>>> Hope it's clearer now.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Aug 8, 2016 at 1:29 PM Rajat Khandelwal <
>>>>>>>>>>> pro...@apache.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi
>>>>>>>>>>>>
>>>>>>>>>>>> So it seems right now the query is designed to return 0 rows.
>>>>>>>>>>>> The query is there just to demonstrate the fallback feature.
>>>>>>>>>>>>
>>>>>>>>>>>> Cube query is
>>>>>>>>>>>>
>>>>>>>>>>>> cube select customer_city_name, store_cost from sales where 
>>>>>>>>>>>> time_range_in(order_time,
>>>>>>>>>>>> '2015-04-13-03', '2015-04-13-04')
>>>>>>>>>>>> Driver query comes out to be
>>>>>>>>>>>>
>>>>>>>>>>>> SELECT (customer_city.name), sum((sales.store_cost)) FROM 
>>>>>>>>>>>> a.local_sales_aggr_fact2 sales join a.local_city_table 
>>>>>>>>>>>> customer_city on sales.customer_city_id = customer_city.id and 
>>>>>>>>>>>> (customer_city.dt = 'latest') WHERE (((((sales.dt) = 
>>>>>>>>>>>> '2015-04-13-04') or ((sales.dt) = '2015-04-13-05')) and 
>>>>>>>>>>>> ((sales.order_time) >= '2015-04-13 03:00:00') and 
>>>>>>>>>>>> ((sales.order_time) < '2015-04-13 04:00:00'))) GROUP BY 
>>>>>>>>>>>> (customer_city.name)
>>>>>>>>>>>>
>>>>>>>>>>>> Then I modified the driver query and ran:
>>>>>>>>>>>>
>>>>>>>>>>>> SELECT (customer_city.name), sales.order_time, sales.store_cost 
>>>>>>>>>>>> FROM a.local_sales_aggr_fact2 sales join a.local_city_table 
>>>>>>>>>>>> customer_city on sales.customer_city_id = customer_city.id and 
>>>>>>>>>>>> (customer_city.dt = 'latest') WHERE (((((sales.dt) = 
>>>>>>>>>>>> '2015-04-13-04') or ((sales.dt) = '2015-04-13-05'))))
>>>>>>>>>>>>
>>>>>>>>>>>> Found the following results:
>>>>>>>>>>>>
>>>>>>>>>>>> Bangalore^A2015-04-13 00:00:00^A0.0
>>>>>>>>>>>> Hyderabad^A2015-04-13 00:00:00^A2.0
>>>>>>>>>>>>
>>>>>>>>>>>> It's apparent that both the rows will be filtered out by the
>>>>>>>>>>>> translated query.
>>>>>>>>>>>>
>>>>>>>>>>>> I'll look deeper and replace the example query with something
>>>>>>>>>>>> that demonstrates the fallback feature and also returns non-empty 
>>>>>>>>>>>> results.
>>>>>>>>>>>>
>>>>>>>>>>>> Let us know if you have any confusion understanding the feature
>>>>>>>>>>>> itself. I think the video should have covered it.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Regards
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> *Tao Yan*
>>>>>>>>> Software Engineer
>>>>>>>>> Data Analytics Infrastructure Tools and Services
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 206.250.5345
>>>>>>>>> t...@linkedin.com
>>>>>>>>> https://www.linkedin.com/in/taousc
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> *Tao Yan*
>>>>>> Software Engineer
>>>>>> Data Analytics Infrastructure Tools and Services
>>>>>>
>>>>>>
>>>>>>
>>>>>> 206.250.5345
>>>>>> t...@linkedin.com
>>>>>> https://www.linkedin.com/in/taousc
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *Tao Yan*
>>>> Software Engineer
>>>> Data Analytics Infrastructure Tools and Services
>>>>
>>>>
>>>>
>>>> 206.250.5345
>>>> t...@linkedin.com
>>>> https://www.linkedin.com/in/taousc
>>>>
>>>
>>
>
>
> --
>
> *Tao Yan*
> Software Engineer
> Data Analytics Infrastructure Tools and Services
>
>
>
> 206.250.5345
> t...@linkedin.com
> https://www.linkedin.com/in/taousc
>

Reply via email to