Hi All-

I have a few questions on Hive, I have been going through the documentation and 
did check with a couple of my known but couldn’t get the satisfactory answer. I 
would appreciate if someone please shed light on this.


MAP JOIN - During the map join phase -  before the actual MR task gets 
initiated, a pre requisite MAP task is launched on the local node where block 
exists to build the hash table and then create a hashFile. Then distribute it 
to the nodes where the map tasks are going to read the blocks of the large 
table.
 
q1. What happens when the data for the smaller table is spread across multiple 
blocks across multiple nodes.
q2. How do these nodes know where are the MAP tasks, and where large table is 
going to get scanned?
 
LLAP - Daemon 
 
During the query execution - does the Application master (coordinating the 
query) launch map and reduce tasks on hybrid nodes (having llap daemons and 
non-llap daemons)
 
1) How does the interactivity of analytic gets solved when the map/reduce task 
is done on one of the containers provided by YARN not supported by llapd.
 
 2) What's the logic to send the request to llapd vs non-llapd node, provided 
we have configured two JDBC URI's separating the YARN cluster into 2 
distinguishable parts.



Thanks
Jagjeet Singh

Reply via email to