Alejandro Fernandez created AMBARI-12113: --------------------------------------------
Summary: Cluster deployment is missing tez.tar.gz in HDFS since service responsible for uploading tarball is not co-hosted with Tez Client Key: AMBARI-12113 URL: https://issues.apache.org/jira/browse/AMBARI-12113 Project: Ambari Issue Type: Bug Components: ambari-server Affects Versions: 2.1.0 Reporter: Alejandro Fernandez Assignee: Alejandro Fernandez Priority: Critical Fix For: 2.1.0 STR: * Deploy cluster with HDFS, YARN, MR, and Tez on 4 hosts as follows, ** Host 1: NameNode, ResourceManager, ZK Server, DataNode, NodeManager ** Host 2: Secondary NameNode, App Timeline Server, ZK Server, DataNode, NodeManager. ** Host 3: ZK Server, DataNode, NodeManager. ** Host 4: Clients ** Host 5: Clients In this case, Host 1 has RM but no Tez client, so it cannot possibly upload the tez tarball to HDFS. Also, consider the following 2 uses cases: 1. Install Tez first, which will require YARN. 2. Install YARN first, which does not require Tez, but still need to upload tez.tar.gz when the Tez Service Check runs. {code} Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/TEZ/0.4.0.2.1/package/scripts/service_check.py", line 98, in <module> TezServiceCheck().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 216, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/TEZ/0.4.0.2.1/package/scripts/service_check.py", line 75, in service_check bin_dir = params.hadoop_bin_dir File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 157, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 152, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 118, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/execute_hadoop.py", line 55, in action_run environment = self.resource.environment, File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 157, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 152, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 118, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 254, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call tries=tries, try_sleep=try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 290, in _call raise Fail(err_msg) resource_management.core.exceptions.Fail: Execution of 'hadoop --config /usr/hdp/2.2.6.0-2800/hadoop/conf jar /usr/hdp/current/tez-client/tez-examples*.jar orderedwordcount /tmp/tezsmokeinput/sample-tez-test /tmp/tezsmokeoutput/' returned 255. Running OrderedWordCount 15/06/17 04:21:50 INFO client.TezClient: Tez Client Version: [ component=tez-api, version=0.5.2.2.2.6.0-2800, revision=790e651b4a64f7589008208580c9790548c2baf8, SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, buildTIme=20150518-1651 ] 15/06/17 04:21:51 INFO impl.TimelineClientImpl: Timeline service address: http://c6405.ambari.apache.org:8188/ws/v1/timeline/ 15/06/17 04:21:51 INFO client.RMProxy: Connecting to ResourceManager at c6405.ambari.apache.org/192.168.64.105:8050 15/06/17 04:21:53 INFO client.TezClient: Submitting DAG application with id: application_1434514777618_0005 15/06/17 04:21:53 INFO client.TezClientUtils: Using tez.lib.uris value from configuration: /hdp/apps/2.2.6.0-2800/tez/tez.tar.gz java.io.FileNotFoundException: File does not exist: /hdp/apps/2.2.6.0-2800/tez/tez.tar.gz at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1140) at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1132) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1132) at org.apache.hadoop.fs.FileSystem.resolvePath(FileSystem.java:750) at org.apache.tez.client.TezClientUtils.getLRFileStatus(TezClientUtils.java:127) at org.apache.tez.client.TezClientUtils.setupTezJarsLocalResources(TezClientUtils.java:178) at org.apache.tez.client.TezClient.getTezJarResources(TezClient.java:721) at org.apache.tez.client.TezClient.submitDAGApplication(TezClient.java:689) at org.apache.tez.client.TezClient.submitDAGApplication(TezClient.java:667) at org.apache.tez.client.TezClient.submitDAG(TezClient.java:353) at org.apache.tez.examples.OrderedWordCount.run(OrderedWordCount.java:208) at org.apache.tez.examples.OrderedWordCount.run(OrderedWordCount.java:232) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.tez.examples.OrderedWordCount.main(OrderedWordCount.java:240) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) at org.apache.tez.examples.ExampleDriver.main(ExampleDriver.java:61) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {code} Analysis: tez.tar.gz needs to be copied to HDFS. The problem is that we don't have a way right now to copy it after all services have been installed and started during cluster deployment, so instead, we rely on services starting to copy the tarball. In order for this to work, the host with Tez Client also needs to have HDFS Client, Yarn Client, and MR Client. Further, copying to HDFS requires NameNode to be up, and DataNodes to be functional. AMBARI-9997 had ResourceManager copy the tez tarball; the problem was that if the host with RM didn't have Tez client, it wouldn't find the tarball. The change I'm proposing is to * Switch this to HistoryServer instead of RM, because this is more efficient during RU since there's only one MR HistoryServer vs many RMs. * Installing Tez also requires YARN service, including HistoryServer. HistoryServer is now co-hosted with Tez Client, so this guarantees it can copy the tarball. * Installing HistoryServer by itself will not copy the tarball. However, if Tez is installed later, then its Service Check is responsible for copying the tarball to HDFS, and this host is also guaranteed to have HDFS Client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)