Hi All- I have a few questions on Hive, I have been going through the documentation and did check with a couple of my known but couldn’t get the satisfactory answer. I would appreciate if someone please shed light on this.
MAP JOIN - During the map join phase - before the actual MR task gets initiated, a pre requisite MAP task is launched on the local node where block exists to build the hash table and then create a hashFile. Then distribute it to the nodes where the map tasks are going to read the blocks of the large table. q1. What happens when the data for the smaller table is spread across multiple blocks across multiple nodes. q2. How do these nodes know where are the MAP tasks, and where large table is going to get scanned? LLAP - Daemon During the query execution - does the Application master (coordinating the query) launch map and reduce tasks on hybrid nodes (having llap daemons and non-llap daemons) 1) How does the interactivity of analytic gets solved when the map/reduce task is done on one of the containers provided by YARN not supported by llapd. 2) What's the logic to send the request to llapd vs non-llapd node, provided we have configured two JDBC URI's separating the YARN cluster into 2 distinguishable parts. Thanks Jagjeet Singh