In the *Attempt two, *are you not supposed to use "hivetry" as the directory?
May be you should try giving the full path " /opt/am/ver/1.0/hive/hivetry/classifier_wf.py" and see if it works. Regards, Ramki. On Thu, Jun 20, 2013 at 9:28 AM, Stephen Boesch <java...@gmail.com> wrote: > > Stephen: would you be willing to share an example of specifying a > "directory" as the add "file" target? I have not seen this working > > I have attempted to use it as follows: > > *We will access a script within the "hivetry" directory located here:* > hive> ! ls -l /opt/am/ver/1.0/hive/hivetry/classifier_wf.py; > -rwxrwxr-x 1 hadoop hadoop 11241 Jun 18 19:37 > /opt/am/ver/1.0/hive/hivetry/classifier_wf.py > > *Add the directory to hive:* > hive> add file /opt/am/ver/1.0/hive/hivetry; > Added resource: /opt/am/ver/1.0/hive/hivetry > > *Attempt to run transform query using that script:* > * > * > *Attempt one: use the script name unqualified:* > > hive> from (select transform (aappname,qappname) using 'classifier_wf.py' > as (aappname2 string, qappname2 string) from eqx ) o insert overwrite table c > select o.aappname2, o.qappname2; > > (Failed: Caused by: java.io.IOException: Cannot run program > "classifier_wf.py": java.io.IOException: error=2, No such file or directory) > > > *Attempt two: use the script name with the directory name prefix: * > > hive> from (select transform (aappname,qappname) using > 'hive/classifier_wf.py' as (aappname2 string, qappname2 string) from eqx ) o > insert overwrite table c select o.aappname2, o.qappname2; > > (Failed: Caused by: java.io.IOException: Cannot run program > "hive/classifier_wf.py": java.io.IOException: error=2, No such file or > directory) > > > > > 2013/6/20 Stephen Sprague <sprag...@gmail.com> > >> yeah. the archive isn't unpacked on the remote side. I think add archive >> is mostly used for finding java packages since CLASSPATH will reference the >> archive (and as such there is no need to expand it.) >> >> >> On Thu, Jun 20, 2013 at 9:00 AM, Stephen Boesch <java...@gmail.com>wrote: >> >>> thx for the tip on "add <file>" where <file> is directory. I will try >>> that. >>> >>> >>> 2013/6/20 Stephen Sprague <sprag...@gmail.com> >>> >>>> i personally only know of adding a .jar file via add archive but my >>>> experience there is very limited. i believe if you 'add file' and the file >>>> is a directory it'll recursively take everything underneath but i know of >>>> nothing that inflates or un tars things on the remote end automatically. >>>> >>>> i would 'add file' your python script and then within that untar your >>>> tarball to get at your model data. its just the matter of figuring out the >>>> path to that tarball that's kinda up in the air when its added as 'add >>>> file'. Yeah. "local downlooads directory". What's the literal path is >>>> what i'd like to know. :) >>>> >>>> >>>> On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch <java...@gmail.com>wrote: >>>> >>>>> >>>>> @Stephen: given the 'relative' path for hive is from a local >>>>> downloads directory on each local tasktracker in the cluster, it was my >>>>> thought that if the archive were actually being expanded then >>>>> somedir/somefileinthearchive should work. I will go ahead and test this >>>>> assumption. >>>>> >>>>> In the meantime, is there any facility available in hive for making >>>>> archived files available to hive jobs? archive or hadoop archive ("har") >>>>> etc? >>>>> >>>>> >>>>> 2013/6/20 Stephen Sprague <sprag...@gmail.com> >>>>> >>>>>> what would be interesting would be to run a little experiment and >>>>>> find out what the default PATH is on your data nodes. How much of a pain >>>>>> would it be to run a little python script to print to stderr the value of >>>>>> the environmental variable $PATH and $PWD (or the shell command 'pwd') ? >>>>>> >>>>>> that's of course going through normal channels of "add file". >>>>>> >>>>>> the thing is given you're using a relative path "hive/parse_qx.py" >>>>>> you need to know what the "current directory" is when the process runs on >>>>>> the data nodes. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch <java...@gmail.com>wrote: >>>>>> >>>>>>> >>>>>>> We have a few dozen files that need to be made available to all >>>>>>> mappers/reducers in the cluster while running hive transformation >>>>>>> steps . >>>>>>> >>>>>>> It seems the "add archive" does not make the entries unarchived and >>>>>>> thus available directly on the default file path - and that is what we >>>>>>> are >>>>>>> looking for. >>>>>>> >>>>>>> To illustrate: >>>>>>> >>>>>>> add file modelfile.1; >>>>>>> add file modelfile.2; >>>>>>> .. >>>>>>> add file modelfile.N; >>>>>>> >>>>>>> Then, our model that is invoked during the transformation step *does >>>>>>> *have correct access to its model files in the defaul path. >>>>>>> >>>>>>> But .. those model files take low *minutes* to all load.. >>>>>>> >>>>>>> instead when we try: >>>>>>> add archive modelArchive.tgz. >>>>>>> >>>>>>> The problem is the archive does not get exploded apparently .. >>>>>>> >>>>>>> I have an archive for example that contains shell scripts under the >>>>>>> "hive" directory stored inside. I am *not *able to access >>>>>>> hive/my-shell-script.sh after adding the archive. Specifically the >>>>>>> following fails: >>>>>>> >>>>>>> $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml >>>>>>> -rwxrwxr-x stephenb/stephenb 664 2013-06-18 17:46 >>>>>>> appminer/bin/launch-quixey_to_xml.sh >>>>>>> >>>>>>> from (select transform (aappname,qappname) >>>>>>> *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 >>>>>>> string) from eqx ) o insert overwrite table c select o.aappname2, >>>>>>> o.qappname2; >>>>>>> >>>>>>> Cannot run program "hive/parse_qx.py": java.io.IOException: error=2, No >>>>>>> such file or directory >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >