Hey everyone, before i write a lot of text, i just post something which is already written: http://www.sqlservercentral.com/Forums/Topic1328496-360-1.aspx
The first posts adresses a pretty similar problem i also have. Currently my implementation looks like this: SELECT id1, MAX( CASE WHEN m.keyword IS NULL THEN 0 WHEN instr(m.keyword, prep_kw.keyword) > 0 THEN 1 ELSE 0 END) AS flag FROM (select id1, keyword from import1) m CROSS JOIN (SELECT keyword FROM et_keywords) prep_kw GROUP BY id1; Since there is a cross join involved, the execution gets pinned down to 1 reducer only and it takes ages to complete. The thread i posted is solving this with some special SQLserver tactics. But I was wondering if anybody has encountered the problem in Hive already and found a better way to solve this. I'm using Hive 0.11 on a MapR Distribution, if this is somehow important. Cheers Wolli