Hi All, I have a large table with few billions of rows and have a very small table with 4 dimensional values. I would like to get rows that match any of these dimensions. For example,
Select field1, field2 from A, B where A.dimension1 = B.dimension1 OR A.dimension2 = B.dimension2 OR A.dimension3 = B.dimension3 OR A.dimension4 = B.dimension4. The query plan takes this as nestedLoop and executes for very long time. If I execute this as Union queries, it takes around 1.5mins for each dimension. Select field1, field2 from A, B where A.dimension1 = B.dimension1 UNION ALL Select field1, field2 from A, B where A.dimension2 = B.dimension2 UNION ALL Select field1, field2 from A, B where A.dimension3 = B.dimension3 UNION ALL Select field1, field2 from A, B where A.dimension4 = B.dimension4. This is obviously not an optimal solution as it makes multiple scanning at same table. Is there any other optimal solution for it? I tried to do this with plain Spark (without SQL) using broadcast map join but the performance was bad than this. Regards Ashok