Anton Kedin created BEAM-5049:
---------------------------------

             Summary: [SQL] Batch Join results in two shuffles
                 Key: BEAM-5049
                 URL: https://issues.apache.org/jira/browse/BEAM-5049
             Project: Beam
          Issue Type: Bug
          Components: dsl-sql
            Reporter: Anton Kedin


The query like this:

{code}
SELECT a.*, b.*, c.* FROM a JOIN b ON a.user_id = b.user_id JOIN c ON a.user_id 
= c.user_id;
{code}

results in two shuffles. Can probably be optimized.

Relevant code:

 - BeamJoinRel implements Join in SQL: 
https://github.com/apache/beam/blob/1675b0f843ed34de8ba6f3676f794db80b40139d/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamJoinRel.java#L194

- CoGBK Join implementation: 
https://github.com/apache/beam/blob/279a05604b83a54e8e5a79e13d8761f94841f326/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L36





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to