Thanks in advance for any help. I have been banging my head against the
wall on this one all day.
When I run the cmd:
hadoop fs -put /path/to/input /path/in/hdfs from the command line, the
hadoop shell dutifully copies my entire file correctly, no matter the size.


I wrote a webservice client for an external service in python and I am
simply trying to replicate the same command after retreiving some csv
delimited results from the webservice

cmd = ['hadoop', 'fs', '-put', '/path/to/input/', '/path/in/hdfs/']
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE,
bufsize=256*1024*1024)
output, errors = p.communicate()
if p.returncode:
   raise OSError(errors)
else:
  LOG.info( output )

without fail the hadoop shell only writes the first 4096 bytes of the input
file (which according to the documentation is the default value for
io.file.buffer.size)

I have tried almost everything including adding
-Dio.file.buffer.size=XXXXXX where XXXXXX is a really big number and
NOTHING seems to work.

Please help!

Reply via email to