How to efficiently query a large table with multiple dimensional table?

ashokkumar rajendran Fri, 11 Mar 2016 01:52:46 -0800

Hi All,

I have a large table with few billions of rows and have a very small table
with 4 dimensional values. I would like to get rows that match any of these
dimensions. For example,


Select field1, field2 from A, B where A.dimension1 = B.dimension1 OR
A.dimension2 = B.dimension2 OR A.dimension3 = B.dimension3 OR A.dimension4
= B.dimension4.

The query plan takes this as nestedLoop and executes for very long time.

If I execute this as Union queries, it takes around 1.5mins for each
dimension.

Select field1, field2 from A, B where A.dimension1 = B.dimension1
UNION ALL
Select field1, field2 from A, B where A.dimension2 = B.dimension2
UNION ALL
Select field1, field2 from A, B where  A.dimension3 = B.dimension3
UNION ALL
Select field1, field2 from A, B where  A.dimension4 = B.dimension4.

This is obviously not an optimal solution as it makes multiple scanning at
same table. Is there any other optimal solution for it?

I tried to do this with plain Spark (without SQL) using broadcast map join
but the performance was bad than this.


Regards
Ashok

How to efficiently query a large table with multiple dimensional table?

Reply via email to