Re:Re: when split data in AM lead to "path not exsit or is not a directory" Execption

LLBian Sun, 17 Jan 2016 01:07:47 -0800


Much thanks for your quick reply,  Hitesh! I will give it a try tomorrow. And 
as you said, I should learn more about YARN local resources.


 best wishes & Thankyou.
-----------LLBian

At 2016-01-16 02:26:22, "Hitesh Shah" <[email protected]> wrote:
>To understand your issue, you should try and read up a bit on YARN local 
>resources. 
>
>That said, this is what Tez will do when you provide an hdfs dir to 
>tez.aux.uris:
>   - consider hdfs://conf/hbasetable/ contains files foo, bar and directory 
> abc. 
>
>In the AM container, if you run a local system command to do an ls:
>    - it will return 2 files: foo and bar 
>      - these are symlinks but they will point to the actual location where 
> bar and foo have been downloaded. 
>    - to clarify, to access them, you would be doing ./foo and ./bar and not 
> conf/hbasetable/foo  
>
>Another thing is that for conf.jar, it will get downloaded as conf.jar. It 
>will not be uncompressed by default. You can use java’s find resources to get 
>to the resources in question within conf.jar or find the jar and uncompress 
>programmatically. Please note that finding by resource name will find the 
>first resource matching that name in the classpath so you may want to be 
>careful in how you approach this. 
>
>I would suggest providing contents of launch_container.sh for each scenario 
>you tried and we can help explain how things are being laid out so that you 
>can then tweak your code as needed. 
>
>The main difference between running on the client vs the AM is that the client 
>will use local filesytem and can leverage absolute and/or relative paths. The 
>AM on the other hand is somewhat like a logical VM which is launched and setup 
>with necessary files, etc but the layout does not match your original client 
>filesystem or HDFS structure. It is more flat in nature. 
>
>thanks
>— Hitesh
> 
>
>
>On Jan 15, 2016, at 1:11 AM, LLBian <[email protected]> wrote:
>
>> 
>>    Thanks for your  so much for your quick response. I tried to work out the 
>> details these tow days. But t didn't work even after I gave it my best shot.
>>    Your suggestion is right, mybe I should  describe the problem More 
>> clearly. 
>>    Hadoop version 2.6.0; hive version 1.2.1; tez version 0.7.0
>>    I tried different ways as follows:
>> （1）For hive,we can add the  third-party by  hive.aux.jars.path. So I add tez 
>> jars by this parameter. When I launch hive by CLI , all these jars will be 
>> loaded to HDFS tez session path " 
>> /tmp/hadoop_yarn/root/_tez_session_dir/[long dynamic number]/". result is 
>> failed.
>> （2） I attempted  to use this parameter to load my own dependent 
>> configuration directory and files(or more accurately，our company's other 
>> team's product)-----[conf/hbasetable]. In mr engine, we put the dependent 
>> directory "conf/hbasetable" under $HIVE_HOME/nbconf, it can worked well. In 
>> spark engine, we encounter similar problem，and solved the problem by 
>> packaging all dependent $HIVE_HOME/nbconf/ to a jar file named conf.jar and 
>> put it to $HIVE_HOME/nblib/, when launching hive, it will be loaded to HDFS 
>> dest directory（use hive.aux.jrs.path ）. result is failed
>> （3）config "tez.aux.uris=hdfs://nnhost:nnport/apps/tmpfiles/",and uploaded 
>> "conf/hbasetable" and the packaged "conf.jar" to this path.  result is failed
>> （4）According to your suggestion，in my hadoop cluster, the related 
>> configurations are
>> yarn.nodemanager.delete.debug-delay-sec=1200;
>> yarn.nodemanager.local-dirs=/hdfsdata/1/yarndata/nm-local-dir;
>> I added some code in TezSessionState.refreshLocalResourcesFromConf(),they 
>> are:
>> ----------------------------------------------------------------------
>> FileSystem fs = FileSystem.get(conf);
>> Path src = new Path("/opt/software/hhive/nbconf/conf/hbasetable");
>> Path dest = new Path(dir + "/conf/hbasetable");
>> if (!fs.exist(dest)) {
>>     fs.copyFromLocalFile(src,dest);
>> }
>> -----------------------------------------------------------------
>>       This can make directries in HDFS, 
>> /tmp/hadoop_yarn/root/_tez_session_dir/[long dynamic 
>> number]/conf/hbasetable, and can upload files to there, but after hive query 
>> "select count(*) from h_im" failed, I find its runtime path (AM container 
>> path):
>> /hdfsdata/1/yarndata/nm-local-dir/usercache/root/appcache/application_1452823977303_004/container_1452823977303_0004_01_000001/
>> there are all resources except "[conf/hbasetable]".
>>         I am sure it should be here, because I printsome logs about its 
>> current running path，it was 
>> "/hdfsdata/1/yarndata/nm-local-dir/usercache/root/appcache/application_1452823977303_004/container_1452823977303_0004_01_000001/conf/hbasetable",also
>>  it printed error messages:
>> "java.lang.ExceptionInInitializerError
>> ……【lang error meesage but small value】omits message........
>> Caused by java.lang.RuntimeException:[conf/hbasetable/] path not exsit or is 
>> not a directory"
>>    Now,my main questions are:
>> (1) Under AM container path of local disk, there are all symbolic files 
>> linked to real jars.  I can not understand why the yarn container didn't 
>> download "conf/hbasetable"，why？ 
>> (2)As mentioned earlier，I also packaged this "conf/hbasetable" to conf.jar, 
>> and it was downloaded to the AM container path, why it can not be  parsed or 
>> decompressed ?
>> 
>>   Is there any configuration options can do this?
>> 
>> best wishes & Thankyou.
>> ------LLBian
>> 
>> 
>> At 2016-01-14 11:18:55, "Hitesh Shah" <[email protected]> wrote:
>>> Hello 
>>> 
>>> You are right that when hive.compute.splits.in.am is true, the splits are 
>>> computed in the cluster in the Tez AM container. 
>>> 
>>> Now, there are a bunch of options to consider but the general gist is that 
>>> if you are familiar with MapReduce Distributed Cache or YARN local 
>>> resources, you need to add the files that your custom input format needs to 
>>> Tez’s version of the distributed cache. The simplest approach for you may 
>>> be to just use “add jar” from Hive which will automatically add these files 
>>> to the distributed cache ( this will copy them from local filesystem into 
>>> HDFS and also make them available in the Tez AM container ). The other 
>>> option is upload all the necessary files to HDFS, say 
>>> “/tmp/additionalfiles/“ and then specify 
>>> “hdfs://nnhost:nnport/tmp/additionalfiles/“ for property “tez.aux.uris” in 
>>> tez-site.xml.  This will add all contents of this HDFS dir as part of the 
>>> distributed cache. Please note that Tez does not do recursive searches in 
>>> the dir but it supports a comma-separate list of files/dirs for 
>>> tez.aux.uris 
>>> 
>>> Next, to debug this, you can do the following:
>>>  - set "yarn.nodemanager.delete.debug-delay-sec” in yarn-site.xml to a 
>>> value like 1200 to help debugging. This will require NodeManager restarts.
>>>  - next, run your query.
>>>  - Find the application on the YARN ResourceManager UI. This app page will 
>>> also tell you which node the AM is running on or ran on. 
>>>  - Go to this node and search for launch_container.sh for the container in 
>>> question ( these files will be found in one of the dirs configured for 
>>> yarn.local-dirs based on your yarn-site.xml )
>>>  - Looking inside launch_container.sh, look for $CWD and see the contents 
>>> of the dir pointed to by $CWD. This will give you an idea of the localized 
>>> files ( from distributed cache ).
>>> 
>>> If you have more questions, can you first clarify what information/files 
>>> are needed for your plugin to run? 
>>> 
>>> thanks
>>> — Hitesh
>>> 
>>> 
>>> 
>>> On Jan 13, 2016, at 7:01 PM, LLBian <[email protected]> wrote:
>>> 
>>>> And,also the log is in yarn container. 
>>>> I try to solve this problem by packaging nbconf/ to a jar file under 
>>>> $HIVE_HOME,then put it under $HIVE_HOME/nblib, it was uploaded to  
>>>> /tmp/hadoop_yarn/root/_tez_session_dir/,but did not work.
>>>> 
>>>> Best regards.
>>>> LLBian
>>>> 
>>>> 01-14-2016
>>>> 
>>>> At 2016-01-14 10:47:18, "LLBian" <[email protected]> wrote:
>>>>> Hi,all
>>>>>   I'm a green hand in using apache tez. Recently,I met some of the 
>>>>> difficulty:
>>>>>   our team has developed a plug-in on hive. It is similar to the function 
>>>>> of HBaseHandler,but customized code. Now my task is to ensure it can be 
>>>>> compatible with tez.  while that is the background.My question is:
>>>>> （1）I have a directory named nbconf,it is created under $HIVE_HOME, under 
>>>>> it,there is a sub-directory named conf/hbasetable. 
>>>>> （2）I also have a directory named nblib,it is  created under 
>>>>> $HIVE_HOME,used for Tez JARs.
>>>>> （3）when I set  hive.compute.splits.in.am=true,it throws Exception in hive 
>>>>> log:
>>>>>  ……
>>>>> [map1]java.lang.ExceptionInInitializerError:
>>>>> ……
>>>>> ……
>>>>> Caused by java.lang.RuntimeException:[conf/hbasetable/] path not exsit or 
>>>>> is not a directory
>>>>> ……
>>>>> 
>>>>> But actually it exists！It is under local $HIVE_HOME/nbconf. When I set  
>>>>> hive.compute.splits.in.am=FALSE,it works well. So, I guess,maybe because 
>>>>> computing splits in Cluster AM,not in localdisk. Mybe I should load some 
>>>>> files or directory(eg.conf/hbasetable)HDFS,If tez wish to do so,where 
>>>>> should I put them？:
>>>>> the tez session dirctory?
>>>>> /tmp/hadoop_yarn/root/_tez_session_dir/? 
>>>>> /tmp/hadoop_yarn/root/_tez_session_dir/.tez/?
>>>>> /tmp/hadoop_yarn/root/_tez_session_dir/.tez/AppId/?
>>>>> I tryed these,but they all didn't work.
>>>>> 
>>>>> becase it is OK when debugging, so I don`t know how to take up the 
>>>>> matter.  I don't know where to put this customed directory 
>>>>> "[conf/hbasetable]" on HDFS.
>>>>> 
>>>>> I am eager to get your guidance. Any help is greatly appreciated .
>>>>> (Please forgive my poor English)
>>>>> 
>>>>> LLBian
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>
Re:Re: when split data in AM lead to "path not exsit or is not a directory" Execption

Reply via email to