Thanks for the explanation. I got the idea of tentative partitions. Is the property *<property name="cube.timedim.partition.order_time" value="ot" />* referring to the dim_attribute *<dim_attribute name="order_time" _type="TIMESTAMP" /> *?
Is the value 'ot' an alias of the partition column name? On Tue, Aug 9, 2016 at 12:47 AM, Rajat Khandelwal <pro...@apache.org> wrote: > The idea of partitions is that there is a stream of data that keeps > flowing in, each line of which has delivery time and order time both. The > data is partitioned and stored in partitions later and facts are created on > top of that partitioned data. So if data is partitioned in order time and > the time range queried is also order time, then we can pick the partitions > directly. But if the data is partitioned on delivery time, as is the case > for this fact, and the query is on order time, then you pick some tentative > partitions of delivery time, but you still have to filter over order time. > Here the purpose of picking partitions is to reduce the amount of data > read. When you pick delivery time partitions, it doesn't guarantee what > order times data has, it only provides a tentative range. Hence, the > partitions are picked based on the relation of time dimensions, but the > extra filter still has to be supplied. > > Hope it's clear. > > On Tue, Aug 9, 2016 at 10:30 AM Rajat Khandelwal <pro...@apache.org> > wrote: > >> That's the original time range asked in the cube query. >> >> On Tue, Aug 9, 2016, 01:46 Tao Yan <t...@linkedin.com> wrote: >> >>> Hi Rajat, >>> >>> Thanks for the explain. I have a clear understanding about dim relation >>> now, however, how does the following condition generated: >>> >>> and ((sales.order_time) >= '2015-04-13 03:00:00') and ((sales.order_time) < >>> '2015-04-13 04:00:00'))) >>> >>> order_time is a dim attributes defined in sales cube, and it does not have >>> any relationships with properties or other dim attributes (The property >>> cube.timedim.partition.order_time does, but is it the same thing?), I >>> wonder how the query could pick up this column. >>> >>> >>> Thanks, >>> >>> >>> On Mon, Aug 8, 2016 at 4:12 AM, Rajat Khandelwal <pro...@apache.org> >>> wrote: >>> >>>> I've created a jira for this: https://issues.apache. >>>> org/jira/browse/LENS-1269 >>>> >>>> You should be able to make change in your data locally and see the >>>> non-empty results: >>>> >>>> cat /Users/rajat.khandelwal/Git/lens/lens-dist/target/apache- >>>> lens-2.7.0-SNAPSHOT-bin/apache-lens-2.7.0-SNAPSHOT- >>>> bin/client/examples/data/sales_aggr_fact2_local3/sales- >>>> aggr-fact2-local3.data >>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,1,1,1,1,1,1,5,0 >>>> 2015-04-13 03:00:00,2015-04-13 04:00:00,2,1,2,2,2,1,8,2 >>>> >>>> >>>> On Mon, Aug 8, 2016 at 2:46 PM Rajat Khandelwal <pro...@apache.org> >>>> wrote: >>>> >>>>> More details on how it converts to the fallback range: >>>>> >>>>> Range asked: >>>>> (order_time, '2015-04-13-03', '2015-04-13-04') >>>>> >>>>> Timedim Relation : >>>>> >>>>> <property name="cube.timedim.relation.order_time" >>>>> value="delivery_time+[-20 days,-1 hour]" /> >>>>> >>>>> <property name="cube.timedim.relation.order_time" >>>>> value="delivery_time+[-2 hours,-1hour]" /> >>>>> >>>>> The cube defines the first relation, the fact defines the second >>>>> relation. A fact can optionally override the timedim relation defined by >>>>> cube. So the second one is picked : >>>>> >>>>> Derivation: >>>>> >>>>> 2015-04-13-03 <=ot < 2015-04-13-04 >>>>> dt - 2 hours <= ot <= dt - 1 hour => ot + 1 hour <= dt <= ot + 2 hour >>>>> => 2015-04-13-04 <= dt < 2015-04-13-06. >>>>> >>>>> Hope it's clearer now. >>>>> >>>>> >>>>> On Mon, Aug 8, 2016 at 1:29 PM Rajat Khandelwal <pro...@apache.org> >>>>> wrote: >>>>> >>>>>> Hi >>>>>> >>>>>> So it seems right now the query is designed to return 0 rows. The >>>>>> query is there just to demonstrate the fallback feature. >>>>>> >>>>>> Cube query is >>>>>> >>>>>> cube select customer_city_name, store_cost from sales where >>>>>> time_range_in(order_time, >>>>>> '2015-04-13-03', '2015-04-13-04') >>>>>> Driver query comes out to be >>>>>> >>>>>> SELECT (customer_city.name), sum((sales.store_cost)) FROM >>>>>> a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on >>>>>> sales.customer_city_id = customer_city.id and (customer_city.dt = >>>>>> 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = >>>>>> '2015-04-13-05')) and ((sales.order_time) >= '2015-04-13 03:00:00') and >>>>>> ((sales.order_time) < '2015-04-13 04:00:00'))) GROUP BY >>>>>> (customer_city.name) >>>>>> >>>>>> Then I modified the driver query and ran: >>>>>> >>>>>> SELECT (customer_city.name), sales.order_time, sales.store_cost FROM >>>>>> a.local_sales_aggr_fact2 sales join a.local_city_table customer_city on >>>>>> sales.customer_city_id = customer_city.id and (customer_city.dt = >>>>>> 'latest') WHERE (((((sales.dt) = '2015-04-13-04') or ((sales.dt) = >>>>>> '2015-04-13-05')))) >>>>>> >>>>>> Found the following results: >>>>>> >>>>>> Bangalore^A2015-04-13 00:00:00^A0.0 >>>>>> Hyderabad^A2015-04-13 00:00:00^A2.0 >>>>>> >>>>>> It's apparent that both the rows will be filtered out by the >>>>>> translated query. >>>>>> >>>>>> I'll look deeper and replace the example query with something that >>>>>> demonstrates the fallback feature and also returns non-empty results. >>>>>> >>>>>> Let us know if you have any confusion understanding the feature >>>>>> itself. I think the video should have covered it. >>>>>> >>>>>> >>>>>> Regards >>>>>> >>>>> >>> >>> >>> -- >>> >>> *Tao Yan* >>> Software Engineer >>> Data Analytics Infrastructure Tools and Services >>> >>> >>> >>> 206.250.5345 >>> t...@linkedin.com >>> https://www.linkedin.com/in/taousc >>> >> -- *Tao Yan* Software Engineer Data Analytics Infrastructure Tools and Services 206.250.5345 t...@linkedin.com https://www.linkedin.com/in/taousc