[ https://issues.apache.org/jira/browse/BEAM-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anton Kedin updated BEAM-5049: ------------------------------ Description: The query like this: {code} SELECT a.*, b.*, c.* FROM a JOIN b ON a.some_id = b.some_id JOIN c ON a.some_id = c.some_id; {code} results in two shuffles. Can probably be optimized. Relevant code: - BeamJoinRel implements Join in SQL: https://github.com/apache/beam/blob/1675b0f843ed34de8ba6f3676f794db80b40139d/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamJoinRel.java#L194 - CoGBK Join implementation: https://github.com/apache/beam/blob/279a05604b83a54e8e5a79e13d8761f94841f326/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L36 was: The query like this: {code} SELECT a.*, b.*, c.* FROM a JOIN b ON a.user_id = b.user_id JOIN c ON a.user_id = c.user_id; {code} results in two shuffles. Can probably be optimized. Relevant code: - BeamJoinRel implements Join in SQL: https://github.com/apache/beam/blob/1675b0f843ed34de8ba6f3676f794db80b40139d/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamJoinRel.java#L194 - CoGBK Join implementation: https://github.com/apache/beam/blob/279a05604b83a54e8e5a79e13d8761f94841f326/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L36 > [SQL] Batch Join results in two shuffles > ---------------------------------------- > > Key: BEAM-5049 > URL: https://issues.apache.org/jira/browse/BEAM-5049 > Project: Beam > Issue Type: Bug > Components: dsl-sql > Reporter: Anton Kedin > Priority: Major > > The query like this: > {code} > SELECT a.*, b.*, c.* FROM a JOIN b ON a.some_id = b.some_id JOIN c ON > a.some_id = c.some_id; > {code} > results in two shuffles. Can probably be optimized. > Relevant code: > - BeamJoinRel implements Join in SQL: > https://github.com/apache/beam/blob/1675b0f843ed34de8ba6f3676f794db80b40139d/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamJoinRel.java#L194 > - CoGBK Join implementation: > https://github.com/apache/beam/blob/279a05604b83a54e8e5a79e13d8761f94841f326/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L36 -- This message was sent by Atlassian JIRA (v7.6.3#76005)