Much thanks for your quick reply, Hitesh! I will give it a try tomorrow. And
as you said, I should learn more about YARN local resources.
best wishes & Thankyou.
-----------LLBian
At 2016-01-16 02:26:22, "Hitesh Shah" <[email protected]> wrote:
>To understand your issue, you should try and read up a bit on YARN local
>resources.
>
>That said, this is what Tez will do when you provide an hdfs dir to
>tez.aux.uris:
> - consider hdfs://conf/hbasetable/ contains files foo, bar and directory
> abc.
>
>In the AM container, if you run a local system command to do an ls:
> - it will return 2 files: foo and bar
> - these are symlinks but they will point to the actual location where
> bar and foo have been downloaded.
> - to clarify, to access them, you would be doing ./foo and ./bar and not
> conf/hbasetable/foo
>
>Another thing is that for conf.jar, it will get downloaded as conf.jar. It
>will not be uncompressed by default. You can use java’s find resources to get
>to the resources in question within conf.jar or find the jar and uncompress
>programmatically. Please note that finding by resource name will find the
>first resource matching that name in the classpath so you may want to be
>careful in how you approach this.
>
>I would suggest providing contents of launch_container.sh for each scenario
>you tried and we can help explain how things are being laid out so that you
>can then tweak your code as needed.
>
>The main difference between running on the client vs the AM is that the client
>will use local filesytem and can leverage absolute and/or relative paths. The
>AM on the other hand is somewhat like a logical VM which is launched and setup
>with necessary files, etc but the layout does not match your original client
>filesystem or HDFS structure. It is more flat in nature.
>
>thanks
>— Hitesh
>
>
>
>On Jan 15, 2016, at 1:11 AM, LLBian <[email protected]> wrote:
>
>>
>> Thanks for your so much for your quick response. I tried to work out the
>> details these tow days. But t didn't work even after I gave it my best shot.
>> Your suggestion is right, mybe I should describe the problem More
>> clearly.
>> Hadoop version 2.6.0; hive version 1.2.1; tez version 0.7.0
>> I tried different ways as follows:
>> (1)For hive,we can add the third-party by hive.aux.jars.path. So I add tez
>> jars by this parameter. When I launch hive by CLI , all these jars will be
>> loaded to HDFS tez session path "
>> /tmp/hadoop_yarn/root/_tez_session_dir/[long dynamic number]/". result is
>> failed.
>> (2) I attempted to use this parameter to load my own dependent
>> configuration directory and files(or more accurately,our company's other
>> team's product)-----[conf/hbasetable]. In mr engine, we put the dependent
>> directory "conf/hbasetable" under $HIVE_HOME/nbconf, it can worked well. In
>> spark engine, we encounter similar problem,and solved the problem by
>> packaging all dependent $HIVE_HOME/nbconf/ to a jar file named conf.jar and
>> put it to $HIVE_HOME/nblib/, when launching hive, it will be loaded to HDFS
>> dest directory(use hive.aux.jrs.path ). result is failed
>> (3)config "tez.aux.uris=hdfs://nnhost:nnport/apps/tmpfiles/",and uploaded
>> "conf/hbasetable" and the packaged "conf.jar" to this path. result is failed
>> (4)According to your suggestion,in my hadoop cluster, the related
>> configurations are
>> yarn.nodemanager.delete.debug-delay-sec=1200;
>> yarn.nodemanager.local-dirs=/hdfsdata/1/yarndata/nm-local-dir;
>> I added some code in TezSessionState.refreshLocalResourcesFromConf(),they
>> are:
>> ----------------------------------------------------------------------
>> FileSystem fs = FileSystem.get(conf);
>> Path src = new Path("/opt/software/hhive/nbconf/conf/hbasetable");
>> Path dest = new Path(dir + "/conf/hbasetable");
>> if (!fs.exist(dest)) {
>> fs.copyFromLocalFile(src,dest);
>> }
>> -----------------------------------------------------------------
>> This can make directries in HDFS,
>> /tmp/hadoop_yarn/root/_tez_session_dir/[long dynamic
>> number]/conf/hbasetable, and can upload files to there, but after hive query
>> "select count(*) from h_im" failed, I find its runtime path (AM container
>> path):
>> /hdfsdata/1/yarndata/nm-local-dir/usercache/root/appcache/application_1452823977303_004/container_1452823977303_0004_01_000001/
>> there are all resources except "[conf/hbasetable]".
>> I am sure it should be here, because I printsome logs about its
>> current running path,it was
>> "/hdfsdata/1/yarndata/nm-local-dir/usercache/root/appcache/application_1452823977303_004/container_1452823977303_0004_01_000001/conf/hbasetable",also
>> it printed error messages:
>> "java.lang.ExceptionInInitializerError
>> ……【lang error meesage but small value】omits message........
>> Caused by java.lang.RuntimeException:[conf/hbasetable/] path not exsit or is
>> not a directory"
>> Now,my main questions are:
>> (1) Under AM container path of local disk, there are all symbolic files
>> linked to real jars. I can not understand why the yarn container didn't
>> download "conf/hbasetable",why?
>> (2)As mentioned earlier,I also packaged this "conf/hbasetable" to conf.jar,
>> and it was downloaded to the AM container path, why it can not be parsed or
>> decompressed ?
>>
>> Is there any configuration options can do this?
>>
>> best wishes & Thankyou.
>> ------LLBian
>>
>>
>> At 2016-01-14 11:18:55, "Hitesh Shah" <[email protected]> wrote:
>>> Hello
>>>
>>> You are right that when hive.compute.splits.in.am is true, the splits are
>>> computed in the cluster in the Tez AM container.
>>>
>>> Now, there are a bunch of options to consider but the general gist is that
>>> if you are familiar with MapReduce Distributed Cache or YARN local
>>> resources, you need to add the files that your custom input format needs to
>>> Tez’s version of the distributed cache. The simplest approach for you may
>>> be to just use “add jar” from Hive which will automatically add these files
>>> to the distributed cache ( this will copy them from local filesystem into
>>> HDFS and also make them available in the Tez AM container ). The other
>>> option is upload all the necessary files to HDFS, say
>>> “/tmp/additionalfiles/“ and then specify
>>> “hdfs://nnhost:nnport/tmp/additionalfiles/“ for property “tez.aux.uris” in
>>> tez-site.xml. This will add all contents of this HDFS dir as part of the
>>> distributed cache. Please note that Tez does not do recursive searches in
>>> the dir but it supports a comma-separate list of files/dirs for
>>> tez.aux.uris
>>>
>>> Next, to debug this, you can do the following:
>>> - set "yarn.nodemanager.delete.debug-delay-sec” in yarn-site.xml to a
>>> value like 1200 to help debugging. This will require NodeManager restarts.
>>> - next, run your query.
>>> - Find the application on the YARN ResourceManager UI. This app page will
>>> also tell you which node the AM is running on or ran on.
>>> - Go to this node and search for launch_container.sh for the container in
>>> question ( these files will be found in one of the dirs configured for
>>> yarn.local-dirs based on your yarn-site.xml )
>>> - Looking inside launch_container.sh, look for $CWD and see the contents
>>> of the dir pointed to by $CWD. This will give you an idea of the localized
>>> files ( from distributed cache ).
>>>
>>> If you have more questions, can you first clarify what information/files
>>> are needed for your plugin to run?
>>>
>>> thanks
>>> — Hitesh
>>>
>>>
>>>
>>> On Jan 13, 2016, at 7:01 PM, LLBian <[email protected]> wrote:
>>>
>>>> And,also the log is in yarn container.
>>>> I try to solve this problem by packaging nbconf/ to a jar file under
>>>> $HIVE_HOME,then put it under $HIVE_HOME/nblib, it was uploaded to
>>>> /tmp/hadoop_yarn/root/_tez_session_dir/,but did not work.
>>>>
>>>> Best regards.
>>>> LLBian
>>>>
>>>> 01-14-2016
>>>>
>>>> At 2016-01-14 10:47:18, "LLBian" <[email protected]> wrote:
>>>>> Hi,all
>>>>> I'm a green hand in using apache tez. Recently,I met some of the
>>>>> difficulty:
>>>>> our team has developed a plug-in on hive. It is similar to the function
>>>>> of HBaseHandler,but customized code. Now my task is to ensure it can be
>>>>> compatible with tez. while that is the background.My question is:
>>>>> (1)I have a directory named nbconf,it is created under $HIVE_HOME, under
>>>>> it,there is a sub-directory named conf/hbasetable.
>>>>> (2)I also have a directory named nblib,it is created under
>>>>> $HIVE_HOME,used for Tez JARs.
>>>>> (3)when I set hive.compute.splits.in.am=true,it throws Exception in hive
>>>>> log:
>>>>> ……
>>>>> [map1]java.lang.ExceptionInInitializerError:
>>>>> ……
>>>>> ……
>>>>> Caused by java.lang.RuntimeException:[conf/hbasetable/] path not exsit or
>>>>> is not a directory
>>>>> ……
>>>>>
>>>>> But actually it exists!It is under local $HIVE_HOME/nbconf. When I set
>>>>> hive.compute.splits.in.am=FALSE,it works well. So, I guess,maybe because
>>>>> computing splits in Cluster AM,not in localdisk. Mybe I should load some
>>>>> files or directory(eg.conf/hbasetable)HDFS,If tez wish to do so,where
>>>>> should I put them?:
>>>>> the tez session dirctory?
>>>>> /tmp/hadoop_yarn/root/_tez_session_dir/?
>>>>> /tmp/hadoop_yarn/root/_tez_session_dir/.tez/?
>>>>> /tmp/hadoop_yarn/root/_tez_session_dir/.tez/AppId/?
>>>>> I tryed these,but they all didn't work.
>>>>>
>>>>> becase it is OK when debugging, so I don`t know how to take up the
>>>>> matter. I don't know where to put this customed directory
>>>>> "[conf/hbasetable]" on HDFS.
>>>>>
>>>>> I am eager to get your guidance. Any help is greatly appreciated .
>>>>> (Please forgive my poor English)
>>>>>
>>>>> LLBian
>>>>
>>>>
>>>>
>>>>
>>>
>