Hi, I would like to add something in this partition discussion:
I found out that Lens doesn't allow me to use the "update fact" command to change the partition update periods. For example, i have the "<update_period >DAILY</update_period>" tag in one of my fact table definition and i now decide to have the MONTHLY partition too, but i couldn't update my fact table to accept that. Even after i dropped the entire fact table and tried to create it again with both "DAILY" and "MONTHLY" as the update periods i still couldn't add the MONTHLY partition. I think some of the meta data still got hanging around and not being deleted after i dropped the table. Is this a bug at Lens? Thanks, Dayou Jiang On Wed, Aug 10, 2016 at 11:36 AM, Tao Yan <t...@linkedin.com> wrote: > Thanks for answering the question! It is helpful. > > On Wed, Aug 10, 2016 at 1:15 AM, amareshwarisr . <amareshw...@gmail.com> > wrote: > >> We have a mapping time dimension columns to its partition columns, >> because system's like HIVE would require partition column to be separate >> column than table columns. >> >> So, if tables already contain order_time as a column, and the data is >> partitioned by order_time - then partition column name has to be a separate >> column - which is defined through the mapping. >> >> Thanks >> >> On Wed, Aug 10, 2016 at 1:40 PM, Rajat Khandelwal <pro...@apache.org> >> wrote: >> >>> "value" is required in a "property" tag. A property tag is name and >>> value. This special property defines that whenever queried on order_time >>> time dimension, try to answer with facts that have "ot" as >>> partition column. If not, it falls back to delivery_time time dimension. >>> There again, the mapping is checked, which is "dt", so facts that have "dt" >>> as partition column are preferred. This can happen in a chain until a valid >>> combination is found which can answer the query. >>> >>> >>> On Wed, Aug 10, 2016 at 12:13 AM Tao Yan <t...@linkedin.com> wrote: >>> >>>> So, the field 'value' can be used to define both alias and real values? >>>> >>>> On Tue, Aug 9, 2016 at 11:40 AM, Rajat Khandelwal <pro...@apache.org> >>>> wrote: >>>> >>>>> Yes, that is correct. >>>>> >>>>> On Wed, Aug 10, 2016, 00:09 Tao Yan <t...@linkedin.com> wrote: >>>>> >>>>>> Thanks for the explanation. I got the idea of tentative partitions. >>>>>> >>>>>> Is the property >>>>>> *<property name="cube.timedim.partition.order_time" value="ot" />* >>>>>> referring to the dim_attribute >>>>>> *<dim_attribute name="order_time" _type="TIMESTAMP" /> *? >>>>>> >>>>>> Is the value 'ot' an alias of the partition column name? >>>>>> >>>>>> >>>>>> On Tue, Aug 9, 2016 at 12:47 AM, Rajat Khandelwal <pro...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> The idea of partitions is that there is a stream of data that keeps >>>>>>> flowing in, each line of which has delivery time and order time both. >>>>>>> The >>>>>>> data is partitioned and stored in partitions later and facts are >>>>>>> created on >>>>>>> top of that partitioned data. So if data is partitioned in order time >>>>>>> and >>>>>>> the time range queried is also order time, then we can pick the >>>>>>> partitions >>>>>>> directly. But if the data is partitioned on delivery time, as is the >>>>>>> case >>>>>>> for this fact, and the query is on order time, then you pick some >>>>>>> tentative >>>>>>> partitions of delivery time, but you still have to filter over order >>>>>>> time. >>>>>>> Here the purpose of picking partitions is to reduce the amount of data >>>>>>> read. When you pick delivery time partitions, it doesn't guarantee what >>>>>>> order times data has, it only provides a tentative range. Hence, the >>>>>>> partitions are picked based on the relation of time dimensions, but the >>>>>>> extra filter still has to be supplied. >>>>>>> >>>>>>> Hope it's clear. >>>>>>> >>>>>>> On Tue, Aug 9, 2016 at 10:30 AM Rajat Khandelwal <pro...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> That's the original time range asked in the cube query. >>>>>>>> >>>>>>>> On Tue, Aug 9, 2016, 01:46 Tao Yan <t...@linkedin.com> wrote: >>>>>>>> >>>>>>>>> Hi Rajat, >>>>>>>>> >>>>>>>>> Thanks for the explain. I have a clear understanding about dim >>>>>>>>> relation now, however, how does the following condition generated: >>>>>>>>> >>>>>>>>> and ((sales.order_time) >= '2015-04-13 03:00:00') and >>>>>>>>> ((sales.order_time) < '2015-04-13 04:00:00'))) >>>>>>>>> >>>>>>>>> order_time is a dim attributes defined in sales cube, and it does not >>>>>>>>> have any relationships with properties or other dim attributes (The >>>>>>>>> property cube.timedim.partition.order_time does, but is it the same >>>>>>>>> thing?), I wonder how the query could pick up this column. >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Aug 8, 2016 at 4:12 AM, Rajat Khandelwal < >>>>>>>>> pro...@apache.org> wrote: >>>>>>>>> >>>>>>>>>> I've created a jira for this: https://issues.apache.or >>>>>>>>>> g/jira/browse/LENS-1269 >>>>>>>>>> >>>>>>>>>> You should be able to make change in your data locally and see >>>>>>>>>> the non-empty results: >>>>>>>>>> >>>>>>>>>> cat /Users/rajat.khandelwal/Git/lens/lens-dist/target/apache-len >>>>>>>>>> s-2.7.0-SNAPSHOT-bin/apache-lens-2.7.0-SNAPSHOT-bin/client/ >>>>>>>>>> examples/data/sales_aggr_fact2_local3/sales-aggr-fact2- >>>>>>>>>> local3.data >>>>>>>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,1,1,1,1,1,1,5,0 >>>>>>>>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,2,1,2,2,2,1,8,2 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Aug 8, 2016 at 2:46 PM Rajat Khandelwal < >>>>>>>>>> pro...@apache.org> wrote: >>>>>>>>>> >>>>>>>>>>> More details on how it converts to the fallback range: >>>>>>>>>>> >>>>>>>>>>> Range asked: >>>>>>>>>>> (order_time, '2015-04-13-03', '2015-04-13-04') >>>>>>>>>>> >>>>>>>>>>> Timedim Relation : >>>>>>>>>>> >>>>>>>>>>> <property name="cube.timedim.relation.order_time" >>>>>>>>>>> value="delivery_time+[-20 days,-1 hour]" /> >>>>>>>>>>> >>>>>>>>>>> <property name="cube.timedim.relation.order_time" >>>>>>>>>>> value="delivery_time+[-2 hours,-1hour]" /> >>>>>>>>>>> >>>>>>>>>>> The cube defines the first relation, the fact defines the second >>>>>>>>>>> relation. A fact can optionally override the timedim relation >>>>>>>>>>> defined by cube. So the second one is picked : >>>>>>>>>>> >>>>>>>>>>> Derivation: >>>>>>>>>>> >>>>>>>>>>> 2015-04-13-03 <=ot < 2015-04-13-04 >>>>>>>>>>> dt - 2 hours <= ot <= dt - 1 hour => ot + 1 hour <= dt <= ot + 2 >>>>>>>>>>> hour >>>>>>>>>>> => 2015-04-13-04 <= dt < 2015-04-13-06. >>>>>>>>>>> >>>>>>>>>>> Hope it's clearer now. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Aug 8, 2016 at 1:29 PM Rajat Khandelwal < >>>>>>>>>>> pro...@apache.org> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi >>>>>>>>>>>> >>>>>>>>>>>> So it seems right now the query is designed to return 0 rows. >>>>>>>>>>>> The query is there just to demonstrate the fallback feature. >>>>>>>>>>>> >>>>>>>>>>>> Cube query is >>>>>>>>>>>> >>>>>>>>>>>> cube select customer_city_name, store_cost from sales where >>>>>>>>>>>> time_range_in(order_time, >>>>>>>>>>>> '2015-04-13-03', '2015-04-13-04') >>>>>>>>>>>> Driver query comes out to be >>>>>>>>>>>> >>>>>>>>>>>> SELECT (customer_city.name), sum((sales.store_cost)) FROM >>>>>>>>>>>> a.local_sales_aggr_fact2 sales join a.local_city_table >>>>>>>>>>>> customer_city on sales.customer_city_id = customer_city.id and >>>>>>>>>>>> (customer_city.dt = 'latest') WHERE (((((sales.dt) = >>>>>>>>>>>> '2015-04-13-04') or ((sales.dt) = '2015-04-13-05')) and >>>>>>>>>>>> ((sales.order_time) >= '2015-04-13 03:00:00') and >>>>>>>>>>>> ((sales.order_time) < '2015-04-13 04:00:00'))) GROUP BY >>>>>>>>>>>> (customer_city.name) >>>>>>>>>>>> >>>>>>>>>>>> Then I modified the driver query and ran: >>>>>>>>>>>> >>>>>>>>>>>> SELECT (customer_city.name), sales.order_time, sales.store_cost >>>>>>>>>>>> FROM a.local_sales_aggr_fact2 sales join a.local_city_table >>>>>>>>>>>> customer_city on sales.customer_city_id = customer_city.id and >>>>>>>>>>>> (customer_city.dt = 'latest') WHERE (((((sales.dt) = >>>>>>>>>>>> '2015-04-13-04') or ((sales.dt) = '2015-04-13-05')))) >>>>>>>>>>>> >>>>>>>>>>>> Found the following results: >>>>>>>>>>>> >>>>>>>>>>>> Bangalore^A2015-04-13 00:00:00^A0.0 >>>>>>>>>>>> Hyderabad^A2015-04-13 00:00:00^A2.0 >>>>>>>>>>>> >>>>>>>>>>>> It's apparent that both the rows will be filtered out by the >>>>>>>>>>>> translated query. >>>>>>>>>>>> >>>>>>>>>>>> I'll look deeper and replace the example query with something >>>>>>>>>>>> that demonstrates the fallback feature and also returns non-empty >>>>>>>>>>>> results. >>>>>>>>>>>> >>>>>>>>>>>> Let us know if you have any confusion understanding the feature >>>>>>>>>>>> itself. I think the video should have covered it. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Regards >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> *Tao Yan* >>>>>>>>> Software Engineer >>>>>>>>> Data Analytics Infrastructure Tools and Services >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> 206.250.5345 >>>>>>>>> t...@linkedin.com >>>>>>>>> https://www.linkedin.com/in/taousc >>>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> *Tao Yan* >>>>>> Software Engineer >>>>>> Data Analytics Infrastructure Tools and Services >>>>>> >>>>>> >>>>>> >>>>>> 206.250.5345 >>>>>> t...@linkedin.com >>>>>> https://www.linkedin.com/in/taousc >>>>>> >>>>> >>>> >>>> >>>> -- >>>> >>>> *Tao Yan* >>>> Software Engineer >>>> Data Analytics Infrastructure Tools and Services >>>> >>>> >>>> >>>> 206.250.5345 >>>> t...@linkedin.com >>>> https://www.linkedin.com/in/taousc >>>> >>> >> > > > -- > > *Tao Yan* > Software Engineer > Data Analytics Infrastructure Tools and Services > > > > 206.250.5345 > t...@linkedin.com > https://www.linkedin.com/in/taousc >