I have the following join that takes 4.5 hours (with 12 nodes) mostly because of a single reduce task that gets the bulk of the work:
SELECT ... FROM T LEFT OUTER JOIN S ON T.timestamp = S.timestamp and T.id = S.id This is a 1:0/1 join so the size of the output is exactly the same as the size of T (500M records). S is actually very small (5K). I've tried: - switching the order of the join conditions - using a different hash function setting (jenkins instead of murmur) - using SET set hive.auto.convert.join = true; - using SET hive.optimize.skewjoin = true; but nothing helped :( Anything else I can try? Thanks!