Can you try setting the weight of the narrow table less than the weight of the other table? I see they are both set to same value: 10.
On Fri, Aug 26, 2016 at 4:04 AM Tao Yan <t...@linkedin.com> wrote: > Hi Lens Developers, > > I am testing 'narrow table' with Lens. If table1 and table2 has the same > granularity and table2's columns is a subset of table1, then table2 is a > narrow table of table1. > > I defined a dimension dimension1: > *dimension1.xml:* > *<x_dimension name="dimension1" xmlns="uri:lens:cube:0.1" > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance > <http://www.w3.org/2001/XMLSchema-instance>"* > * xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">* > * <attributes>* > * <dim_attribute name="primary_key" _type="BIGINT"/>* > * <dim_attribute name="attr1" _type="BIGINT"/>* > * <dim_attribute name="attr2" _type="BIGINT"/>* > * <dim_attribute name="attr3" _type="BIGINT"/>* > * <dim_attribute name="attr4" _type="BIGINT"/>* > * <dim_attribute name="attr5" _type="BIGINT"/>* > * <dim_attribute name="attr6" _type="BIGINT"/>* > * <dim_attribute name="attr7" _type="BIGINT"/>* > * <dim_attribute name="attr8" _type="BIGINT"/>* > * </attributes>* > * <properties>* > * <property name="dimension.dimension1.timed.dimension" value="dt"/>* > * </properties>* > *</x_dimension>* > > And defined two dimtables under this dimension: > *dimension1_division2.xml* > <x_dimension_table dimension_name="dimension1" > table_name="dimension1_division2" weight="10.0" xmlns="uri:lens:cube:0.1" > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd "> > <columns> > <column name="primary_key" _type="BIGINT"/> > * <column name="attr5" _type="BIGINT"/>* > * <column name="attr6" _type="BIGINT"/>* > * <column name="attr7" _type="BIGINT"/>* > * <column name="attr8" _type="BIGINT"/>* > </columns> > <properties> > <property name="dimension1.prop" value="d2"/> > </properties> > <storage_tables> > ... > </storage_tables> > </x_dimension_table> > > *dimension1_division2_subset.xml* > <x_dimension_table dimension_name="dimension1" > table_name="dimension1_division2_subset" weight="10.0" > xmlns="uri:lens:cube:0.1" > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd "> > <columns> > <column name="primary_key" _type="BIGINT"/> > * <column name="attr5" _type="BIGINT"/>* > * <column name="attr6" _type="BIGINT"/>* > </columns> > <properties> > <property name="dimension1.prop" value="d2_subset"/> > </properties> > <storage_tables> > ... > </storage_tables> > </x_dimension_table> > > Then, I run the query again attr5: > *lens-shell>select primary_key, attr5 from dimension1* > > I expect it will use the table *dimension1_division2_subset *since it is > a narrow table and hence will be more efficient. However, it picked up the > table *dimension1_division2:* > > *25 Aug 2016 22:18:14 [Spring Shell] INFO cliLogger - Driver query: > 'INSERT OVERWRITE DIRECTORY > "hdfs://ltx1-holdemnn01.grid.linkedin.com:9000/user/dev_svc/lensreports/hdfsout/f20e5f20-3f09-4d32-a1b0-ac79d97e0f1a > <http://ltx1-holdemnn01.grid.linkedin.com:9000/user/dev_svc/lensreports/hdfsout/f20e5f20-3f09-4d32-a1b0-ac79d97e0f1a>" > SELECT ( dimension1 . primary_key ), ( dimension1 . attr5 ) FROM > holdem_dimension1_division2 dimension1 WHERE ((((dimension1.dt = > 'latest')))) ' and Driver handle: OperationHandle > [opType=EXECUTE_STATEMENT, > getHandleIdentifier()=6ed9819c-d05f-427f-9ec3-740644072ce9]* > > Based on the server log, it seems the first candidate dimtable is picked: > > *[estimate-11] INFO org.apache.lens.cube.parse.CubeQueryContext - > Available candidate dims are:[dimension1_division2, > dimension1_division2_subset], picking up dimension1_division2 for querying.* > > Is it the expected behavior? If that is the case, then, do we have plan > to support this type of optimization? > > Thanks, > -- > > *Tao Yan* > Software Engineer > Data Analytics Infrastructure Tools and Services > > > > 206.250.5345 > t...@linkedin.com > https://www.linkedin.com/in/taousc >