Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

Stephen Boesch Thu, 20 Jun 2013 10:24:01 -0700

Good eyes Ramki!  thanks this "directory" in place of filename appears to
be working.  The script is getting loaded now using the "Attempt two" i.e.
 the hivetry/classification_wf.py as the script path.


thanks again.

stephenb


2013/6/20 Ramki Palle <ramki.pa...@gmail.com>

> In the *Attempt two, *are you not supposed to use "hivetry" as the
> directory?
>
> May be you should try giving the full path "
> /opt/am/ver/1.0/hive/hivetry/classifier_wf.py" and see if it works.
>
> Regards,
> Ramki.
>
>
> On Thu, Jun 20, 2013 at 9:28 AM, Stephen Boesch <java...@gmail.com> wrote:
>
>>
>> Stephen:  would you be willing to share an example of specifying a
>> "directory" as the  add "file" target?    I have not seen this working
>>
>> I have attempted to use it as follows:
>>
>> *We will access a script within the "hivetry" directory located here:*
>> hive> ! ls -l  /opt/am/ver/1.0/hive/hivetry/classifier_wf.py;
>> -rwxrwxr-x 1 hadoop hadoop 11241 Jun 18 19:37
>> /opt/am/ver/1.0/hive/hivetry/classifier_wf.py
>>
>> *Add the directory  to hive:*
>> hive> add file /opt/am/ver/1.0/hive/hivetry;
>> Added resource: /opt/am/ver/1.0/hive/hivetry
>>
>> *Attempt to run transform query using that script:*
>> *
>> *
>> *Attempt one: use the script name unqualified:*
>>
>> hive>    from (select transform (aappname,qappname) using 'classifier_wf.py' 
>> as (aappname2 string, qappname2 string) from eqx ) o insert overwrite table 
>> c select o.aappname2, o.qappname2;
>>
>>
>> (Failed:   Caused by: java.io.IOException: Cannot run program 
>> "classifier_wf.py": java.io.IOException: error=2, No such file or directory)
>>
>>
>> *Attempt two: use the script name with the directory name prefix: *
>>
>> hive>    from (select transform (aappname,qappname) using 
>> 'hive/classifier_wf.py' as (aappname2 string, qappname2 string) from eqx ) o 
>> insert overwrite table c select o.aappname2, o.qappname2;
>>
>>
>> (Failed:   Caused by: java.io.IOException: Cannot run program 
>> "hive/classifier_wf.py": java.io.IOException: error=2, No such file or 
>> directory)
>>
>>
>>
>>
>>
>> 2013/6/20 Stephen Sprague <sprag...@gmail.com>
>>
>>> yeah.  the archive isn't unpacked on the remote side. I think add
>>> archive is mostly used for finding java packages since CLASSPATH will
>>> reference the archive (and as such there is no need to expand it.)
>>>
>>>
>>> On Thu, Jun 20, 2013 at 9:00 AM, Stephen Boesch <java...@gmail.com>wrote:
>>>
>>>> thx for the tip on "add <file>" where <file> is directory. I will try
>>>> that.
>>>>
>>>>
>>>> 2013/6/20 Stephen Sprague <sprag...@gmail.com>
>>>>
>>>>> i personally only know of adding a .jar file via add archive but my
>>>>> experience there is very limited.  i believe if you 'add file' and the 
>>>>> file
>>>>> is a directory it'll recursively take everything underneath but i know of
>>>>> nothing that inflates or un tars things on the remote end automatically.
>>>>>
>>>>> i would 'add file' your python script and then within that untar your
>>>>> tarball to get at your model data. its just the matter of figuring out the
>>>>> path to that tarball that's kinda up in the air when its added as 'add
>>>>> file'.  Yeah. "local downlooads directory".  What's the literal path is
>>>>> what i'd like to know. :)
>>>>>
>>>>>
>>>>> On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch <java...@gmail.com>wrote:
>>>>>
>>>>>>
>>>>>> @Stephen:  given the  'relative' path for hive is from a local
>>>>>> downloads directory on each local tasktracker in the cluster,  it was my
>>>>>> thought that if the archive were actually being expanded then
>>>>>> somedir/somefileinthearchive  should work.  I will go ahead and test this
>>>>>> assumption.
>>>>>>
>>>>>> In the meantime, is there any facility available in hive for making
>>>>>> archived files available to hive jobs?  archive or hadoop archive ("har")
>>>>>> etc?
>>>>>>
>>>>>>
>>>>>> 2013/6/20 Stephen Sprague <sprag...@gmail.com>
>>>>>>
>>>>>>> what would be interesting would be to run a little experiment and
>>>>>>> find out what the default PATH is on your data nodes.  How much of a 
>>>>>>> pain
>>>>>>> would it be to run a little python script to print to stderr the value 
>>>>>>> of
>>>>>>> the environmental variable $PATH and $PWD (or the shell command 'pwd') ?
>>>>>>>
>>>>>>> that's of course going through normal channels of "add file".
>>>>>>>
>>>>>>> the thing is given you're using a relative path "hive/parse_qx.py"
>>>>>>> you need to know what the "current directory" is when the process runs 
>>>>>>> on
>>>>>>> the data nodes.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch 
>>>>>>> <java...@gmail.com>wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> We have a few dozen files that need to be made available to all
>>>>>>>> mappers/reducers in the cluster while running  hive transformation 
>>>>>>>> steps .
>>>>>>>>
>>>>>>>> It seems the "add archive"  does not make the entries unarchived
>>>>>>>> and thus available directly on the default file path - and that is 
>>>>>>>> what we
>>>>>>>> are looking for.
>>>>>>>>
>>>>>>>> To illustrate:
>>>>>>>>
>>>>>>>>    add file modelfile.1;
>>>>>>>>    add file modelfile.2;
>>>>>>>>    ..
>>>>>>>>     add file modelfile.N;
>>>>>>>>
>>>>>>>>   Then, our model that is invoked during the transformation step *does
>>>>>>>> *have correct access to its model files in the defaul path.
>>>>>>>>
>>>>>>>> But .. those model files take low *minutes* to all load..
>>>>>>>>
>>>>>>>> instead when we try:
>>>>>>>>    add archive  modelArchive.tgz.
>>>>>>>>
>>>>>>>> The problem is the archive does not get exploded apparently ..
>>>>>>>>
>>>>>>>> I have an archive for example that contains shell scripts under the
>>>>>>>> "hive" directory stored inside.  I am *not *able to access
>>>>>>>> hive/my-shell-script.sh  after adding the archive. Specifically the
>>>>>>>> following fails:
>>>>>>>>
>>>>>>>> $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml
>>>>>>>> -rwxrwxr-x stephenb/stephenb    664 2013-06-18 17:46
>>>>>>>> appminer/bin/launch-quixey_to_xml.sh
>>>>>>>>
>>>>>>>> from (select transform (aappname,qappname)
>>>>>>>> *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2
>>>>>>>> string) from eqx ) o insert overwrite table c select o.aappname2,
>>>>>>>> o.qappname2;
>>>>>>>>
>>>>>>>> Cannot run program "hive/parse_qx.py": java.io.IOException: error=2, 
>>>>>>>> No such file or directory
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Is there a mechanism similar to hadoop -archive in hive (add archive is not apparently)

Reply via email to