I have an hdfs directory that contains audio files. I wish to run fpcalc on each file using Hadoop streaming. I can do this locally no problem, but in hadoop fpcalc cannot see the files. My code is:
import shlex cli = './fpcalc -raw -length ' + str(sample_length) + ' ' + file_a from subprocess import Popen, PIPE cli_parts = shlex.split(cli) fpcalc_cli = Popen(cli_parts, stdin=PIPE, stderr=PIPE, stdout=PIPE) fpcalc_out,fpcalc_err=fpcalc_cli.communicate() cli_parts is: ['./fpcalc', '-raw', '-length', '30', '/user/hduser/audio/input/flacOriginal1.flac'] and runs fine locally. fpcalc_err is: ERROR: couldn't open the file ERROR: unable to calculate fingerprint for file /user/hduser/audio/input/flacOriginal1.flac, skipping the file DOES exist: hadoop fs -ls /user/hduser/audio/input/flacOriginal1.flac Found 1 items -rw-r--r-- 1 hduser supergroup 2710019 2014-08-08 11:49 /user/hduser/audio/input/flacOriginal1.flac Can I point to a file like this in Hadoop streaming? TIA!!!! Read how Aylesbury and the Earth were created, here: http://edday.co.uk