Alejandro Fernandez created AMBARI-12113:
--------------------------------------------

             Summary: Cluster deployment is missing tez.tar.gz in HDFS since 
service responsible for uploading tarball is not co-hosted with Tez Client
                 Key: AMBARI-12113
                 URL: https://issues.apache.org/jira/browse/AMBARI-12113
             Project: Ambari
          Issue Type: Bug
          Components: ambari-server
    Affects Versions: 2.1.0
            Reporter: Alejandro Fernandez
            Assignee: Alejandro Fernandez
            Priority: Critical
             Fix For: 2.1.0


STR:

* Deploy cluster with HDFS, YARN, MR, and Tez on 4 hosts as follows,
** Host 1: NameNode, ResourceManager, ZK Server, DataNode, NodeManager
** Host 2: Secondary NameNode, App Timeline Server, ZK Server, DataNode, 
NodeManager.
** Host 3: ZK Server, DataNode, NodeManager.
** Host 4: Clients
** Host 5: Clients

In this case, Host 1 has RM but no Tez client, so it cannot possibly upload the 
tez tarball to HDFS.

Also, consider the following 2 uses cases:
1. Install Tez first, which will require YARN.
2. Install YARN first, which does not require Tez, but still need to upload 
tez.tar.gz when the Tez Service Check runs.

{code}
Traceback (most recent call last):
  File 
"/var/lib/ambari-agent/cache/common-services/TEZ/0.4.0.2.1/package/scripts/service_check.py",
 line 98, in <module>
    TezServiceCheck().execute()
  File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
 line 216, in execute
    method(env)
  File 
"/var/lib/ambari-agent/cache/common-services/TEZ/0.4.0.2.1/package/scripts/service_check.py",
 line 75, in service_check
    bin_dir = params.hadoop_bin_dir
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", 
line 157, in __init__
    self.env.run()
  File 
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py", 
line 152, in run
    self.run_action(resource, action)
  File 
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py", 
line 118, in run_action
    provider_action()
  File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/providers/execute_hadoop.py",
 line 55, in action_run
    environment = self.resource.environment,
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", 
line 157, in __init__
    self.env.run()
  File 
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py", 
line 152, in run
    self.run_action(resource, action)
  File 
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py", 
line 118, in run_action
    provider_action()
  File 
"/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py",
 line 254, in action_run
    tries=self.resource.tries, try_sleep=self.resource.try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
line 70, in inner
    result = function(command, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
line 92, in checked_call
    tries=tries, try_sleep=try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
line 140, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
line 290, in _call
    raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'hadoop --config 
/usr/hdp/2.2.6.0-2800/hadoop/conf jar 
/usr/hdp/current/tez-client/tez-examples*.jar orderedwordcount 
/tmp/tezsmokeinput/sample-tez-test /tmp/tezsmokeoutput/' returned 255. Running 
OrderedWordCount
15/06/17 04:21:50 INFO client.TezClient: Tez Client Version: [ 
component=tez-api, version=0.5.2.2.2.6.0-2800, 
revision=790e651b4a64f7589008208580c9790548c2baf8, 
SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, 
buildTIme=20150518-1651 ]
15/06/17 04:21:51 INFO impl.TimelineClientImpl: Timeline service address: 
http://c6405.ambari.apache.org:8188/ws/v1/timeline/
15/06/17 04:21:51 INFO client.RMProxy: Connecting to ResourceManager at 
c6405.ambari.apache.org/192.168.64.105:8050
15/06/17 04:21:53 INFO client.TezClient: Submitting DAG application with id: 
application_1434514777618_0005
15/06/17 04:21:53 INFO client.TezClientUtils: Using tez.lib.uris value from 
configuration: /hdp/apps/2.2.6.0-2800/tez/tez.tar.gz
java.io.FileNotFoundException: File does not exist: 
/hdp/apps/2.2.6.0-2800/tez/tez.tar.gz
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1140)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1132)
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1132)
        at org.apache.hadoop.fs.FileSystem.resolvePath(FileSystem.java:750)
        at 
org.apache.tez.client.TezClientUtils.getLRFileStatus(TezClientUtils.java:127)
        at 
org.apache.tez.client.TezClientUtils.setupTezJarsLocalResources(TezClientUtils.java:178)
        at 
org.apache.tez.client.TezClient.getTezJarResources(TezClient.java:721)
        at 
org.apache.tez.client.TezClient.submitDAGApplication(TezClient.java:689)
        at 
org.apache.tez.client.TezClient.submitDAGApplication(TezClient.java:667)
        at org.apache.tez.client.TezClient.submitDAG(TezClient.java:353)
        at 
org.apache.tez.examples.OrderedWordCount.run(OrderedWordCount.java:208)
        at 
org.apache.tez.examples.OrderedWordCount.run(OrderedWordCount.java:232)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at 
org.apache.tez.examples.OrderedWordCount.main(OrderedWordCount.java:240)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
        at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
        at org.apache.tez.examples.ExampleDriver.main(ExampleDriver.java:61)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
{code}

Analysis:
tez.tar.gz needs to  be copied to HDFS. The problem is that we don't have a way 
right now to copy it after all services have been installed and started during 
cluster deployment, so instead, we rely on services starting to copy the 
tarball.
In order for this to work, the host with Tez Client also needs to have HDFS 
Client, Yarn Client, and MR Client. Further, copying to HDFS requires NameNode 
to be up, and DataNodes to be functional.
AMBARI-9997 had ResourceManager copy the tez tarball; the problem was that if 
the host with RM didn't have Tez client, it wouldn't find the tarball.

The change I'm proposing is to
* Switch this to HistoryServer instead of RM, because this is more efficient 
during RU since there's only one MR HistoryServer vs many RMs.
* Installing Tez also requires YARN service, including HistoryServer. 
HistoryServer is now co-hosted with Tez Client, so this guarantees it can copy 
the tarball.
* Installing HistoryServer by itself will not copy the tarball. However, if Tez 
is installed later, then its Service Check is responsible for copying the 
tarball to HDFS, and this host is also guaranteed to have HDFS Client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to