Join Bottleneck

Rex Fenley Fri, 06 Nov 2020 10:29:15 -0800

Hello,

I have a Job that's a series of Joins, GroupBys, and Aggs and it's
bottlenecked in one of the joins. The join's cardinality is ~300 million
rows on the left and ~200 million rows on the right all with unique keys.
I'm seeing this in the plan for that bottlenecked Join.


Join(joinType=[InnerJoin], where=[(user_id = id0)], select=[id, group_id,
user_id, uuid, owner, id0, deleted_at], leftInputSpec=[HasUniqueKey],
rightInputSpec=[JoinKeyContainsUniqueKey])

The join condition is basically (left.user_id === right.id). So `id0` must
be right.id here.

My first question is, what is the difference between

leftInputSpec=[HasUniqueKey]

and

rightInputSpec=[JoinKeyContainsUniqueKey]

 ?

Is the left side not using the join key for hashing the join but instead
using its pk id, which would be underperformant?

Is there anything else about this that stands out?

Thanks!

-- 

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
 |  FOLLOW
US <https://twitter.com/remindhq>  |  LIKE US
<https://www.facebook.com/remindhq>

Join Bottleneck

Reply via email to