This is probably a simple question, but I'm noticing that for queries that
run on 1+TB of data, it can take Hive up to 30 minutes to actually start
the first map-reduce stage. What is it doing? I imagine it's gathering
information about the data somehow, this 'startup' time is clearly a
function of the amount of data I'm trying to process.

Cheers,

Reply via email to