>>(1)- For every TEZ AM it is possible to launch just a single query/DAG at a >>time. So within a given AM several DAGs can be executed only in sequential >>order (a.k.a. a session), not in parallel. To execute DAGs in parallel we >>always need several AMs.
Correct. Today a single AM will accept new DAGs when the AM is idle and run them. An AM is idle when no DAG is running. >>(2)- The AM is user-specific, and each user is expected to run queries >>through its own AM (or on multiple AMs if there is a need for parallelism). Correct in a secure cluster. In a non-secure cluster an AM runs as the yarn user which is common to all AMs. In a secure cluster, any entity that has been given a client token (for that app attempt) by the RM, can communicate with the AM. In a non-secure cluster, any entity that has obtained the AMs connection information from the RM can communicate with the AM. The AM has an additional set of ACL’s that determine who can submit, view, modify DAGs. >>(3)- Several users can submit their DAGs as the same user (e.g.: through >>hiveserver2), but in this case we will still have several AM. Correct. However, the number of AMs will be determined by the policy of the mediating server. It may choose to launch a new AM for every new DAG. Or queue up and round robin through a limited set of AMs, etc. Bikas From: Fabio C. [mailto:anyte...@gmail.com] Sent: Monday, March 09, 2015 4:31 AM To: u...@tez.apache.org; user@hive.apache.org Subject: Parallel queries/dags running in same AM? Hi all, I've been using Tez on hive, and I had a chance to hear a conversation that mismatches with my present knowledge, can anyone confirm the following statement? (1)- For every TEZ AM it is possible to launch just a single query/DAG at a time. So within a given AM several DAGs can be executed only in sequential order (a.k.a. a session), not in parallel. To execute DAGs in parallel we always need several AMs. (2)- The AM is user-specific, and each user is expected to run queries through its own AM (or on multiple AMs if there is a need for parallelism). (3)- Several users can submit their DAGs as the same user (e.g.: through hiveserver2), but in this case we will still have several AM. Thanks in advance Fabio