Good eyes Ramki! thanks this "directory" in place of filename appears to be working. The script is getting loaded now using the "Attempt two" i.e. the hivetry/classification_wf.py as the script path.
thanks again. stephenb 2013/6/20 Ramki Palle <ramki.pa...@gmail.com> > In the *Attempt two, *are you not supposed to use "hivetry" as the > directory? > > May be you should try giving the full path " > /opt/am/ver/1.0/hive/hivetry/classifier_wf.py" and see if it works. > > Regards, > Ramki. > > > On Thu, Jun 20, 2013 at 9:28 AM, Stephen Boesch <java...@gmail.com> wrote: > >> >> Stephen: would you be willing to share an example of specifying a >> "directory" as the add "file" target? I have not seen this working >> >> I have attempted to use it as follows: >> >> *We will access a script within the "hivetry" directory located here:* >> hive> ! ls -l /opt/am/ver/1.0/hive/hivetry/classifier_wf.py; >> -rwxrwxr-x 1 hadoop hadoop 11241 Jun 18 19:37 >> /opt/am/ver/1.0/hive/hivetry/classifier_wf.py >> >> *Add the directory to hive:* >> hive> add file /opt/am/ver/1.0/hive/hivetry; >> Added resource: /opt/am/ver/1.0/hive/hivetry >> >> *Attempt to run transform query using that script:* >> * >> * >> *Attempt one: use the script name unqualified:* >> >> hive> from (select transform (aappname,qappname) using 'classifier_wf.py' >> as (aappname2 string, qappname2 string) from eqx ) o insert overwrite table >> c select o.aappname2, o.qappname2; >> >> >> (Failed: Caused by: java.io.IOException: Cannot run program >> "classifier_wf.py": java.io.IOException: error=2, No such file or directory) >> >> >> *Attempt two: use the script name with the directory name prefix: * >> >> hive> from (select transform (aappname,qappname) using >> 'hive/classifier_wf.py' as (aappname2 string, qappname2 string) from eqx ) o >> insert overwrite table c select o.aappname2, o.qappname2; >> >> >> (Failed: Caused by: java.io.IOException: Cannot run program >> "hive/classifier_wf.py": java.io.IOException: error=2, No such file or >> directory) >> >> >> >> >> >> 2013/6/20 Stephen Sprague <sprag...@gmail.com> >> >>> yeah. the archive isn't unpacked on the remote side. I think add >>> archive is mostly used for finding java packages since CLASSPATH will >>> reference the archive (and as such there is no need to expand it.) >>> >>> >>> On Thu, Jun 20, 2013 at 9:00 AM, Stephen Boesch <java...@gmail.com>wrote: >>> >>>> thx for the tip on "add <file>" where <file> is directory. I will try >>>> that. >>>> >>>> >>>> 2013/6/20 Stephen Sprague <sprag...@gmail.com> >>>> >>>>> i personally only know of adding a .jar file via add archive but my >>>>> experience there is very limited. i believe if you 'add file' and the >>>>> file >>>>> is a directory it'll recursively take everything underneath but i know of >>>>> nothing that inflates or un tars things on the remote end automatically. >>>>> >>>>> i would 'add file' your python script and then within that untar your >>>>> tarball to get at your model data. its just the matter of figuring out the >>>>> path to that tarball that's kinda up in the air when its added as 'add >>>>> file'. Yeah. "local downlooads directory". What's the literal path is >>>>> what i'd like to know. :) >>>>> >>>>> >>>>> On Thu, Jun 20, 2013 at 8:37 AM, Stephen Boesch <java...@gmail.com>wrote: >>>>> >>>>>> >>>>>> @Stephen: given the 'relative' path for hive is from a local >>>>>> downloads directory on each local tasktracker in the cluster, it was my >>>>>> thought that if the archive were actually being expanded then >>>>>> somedir/somefileinthearchive should work. I will go ahead and test this >>>>>> assumption. >>>>>> >>>>>> In the meantime, is there any facility available in hive for making >>>>>> archived files available to hive jobs? archive or hadoop archive ("har") >>>>>> etc? >>>>>> >>>>>> >>>>>> 2013/6/20 Stephen Sprague <sprag...@gmail.com> >>>>>> >>>>>>> what would be interesting would be to run a little experiment and >>>>>>> find out what the default PATH is on your data nodes. How much of a >>>>>>> pain >>>>>>> would it be to run a little python script to print to stderr the value >>>>>>> of >>>>>>> the environmental variable $PATH and $PWD (or the shell command 'pwd') ? >>>>>>> >>>>>>> that's of course going through normal channels of "add file". >>>>>>> >>>>>>> the thing is given you're using a relative path "hive/parse_qx.py" >>>>>>> you need to know what the "current directory" is when the process runs >>>>>>> on >>>>>>> the data nodes. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Jun 20, 2013 at 5:32 AM, Stephen Boesch >>>>>>> <java...@gmail.com>wrote: >>>>>>> >>>>>>>> >>>>>>>> We have a few dozen files that need to be made available to all >>>>>>>> mappers/reducers in the cluster while running hive transformation >>>>>>>> steps . >>>>>>>> >>>>>>>> It seems the "add archive" does not make the entries unarchived >>>>>>>> and thus available directly on the default file path - and that is >>>>>>>> what we >>>>>>>> are looking for. >>>>>>>> >>>>>>>> To illustrate: >>>>>>>> >>>>>>>> add file modelfile.1; >>>>>>>> add file modelfile.2; >>>>>>>> .. >>>>>>>> add file modelfile.N; >>>>>>>> >>>>>>>> Then, our model that is invoked during the transformation step *does >>>>>>>> *have correct access to its model files in the defaul path. >>>>>>>> >>>>>>>> But .. those model files take low *minutes* to all load.. >>>>>>>> >>>>>>>> instead when we try: >>>>>>>> add archive modelArchive.tgz. >>>>>>>> >>>>>>>> The problem is the archive does not get exploded apparently .. >>>>>>>> >>>>>>>> I have an archive for example that contains shell scripts under the >>>>>>>> "hive" directory stored inside. I am *not *able to access >>>>>>>> hive/my-shell-script.sh after adding the archive. Specifically the >>>>>>>> following fails: >>>>>>>> >>>>>>>> $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml >>>>>>>> -rwxrwxr-x stephenb/stephenb 664 2013-06-18 17:46 >>>>>>>> appminer/bin/launch-quixey_to_xml.sh >>>>>>>> >>>>>>>> from (select transform (aappname,qappname) >>>>>>>> *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 >>>>>>>> string) from eqx ) o insert overwrite table c select o.aappname2, >>>>>>>> o.qappname2; >>>>>>>> >>>>>>>> Cannot run program "hive/parse_qx.py": java.io.IOException: error=2, >>>>>>>> No such file or directory >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >