We have a few dozen files that need to be made available to all mappers/reducers in the cluster while running hive transformation steps .
It seems the "add archive" does not make the entries unarchived and thus available directly on the default file path - and that is what we are looking for. To illustrate: add file modelfile.1; add file modelfile.2; .. add file modelfile.N; Then, our model that is invoked during the transformation step *does *have correct access to its model files in the defaul path. But .. those model files take low *minutes* to all load.. instead when we try: add archive modelArchive.tgz. The problem is the archive does not get exploded apparently .. I have an archive for example that contains shell scripts under the "hive" directory stored inside. I am *not *able to access hive/my-shell-script.sh after adding the archive. Specifically the following fails: $ tar -tvf appm*.tar.gz | grep launch-quixey_to_xml -rwxrwxr-x stephenb/stephenb 664 2013-06-18 17:46 appminer/bin/launch-quixey_to_xml.sh from (select transform (aappname,qappname) *using *'*hive/parse_qx.py*' as (aappname2 string, qappname2 string) from eqx ) o insert overwrite table c select o.aappname2, o.qappname2; Cannot run program "hive/parse_qx.py": java.io.IOException: error=2, No such file or directory