Most likely you are closing the connection with HDFS. Can you paste the
piece of code that you are executing?
We were having similar problem when we closed the FileSystem object in our
code.
Thanks
Best Regards
On Thu, Jul 24, 2014 at 11:00 PM, Eric Friedman eric.d.fried...@gmail.com
wrote:
I ported the same code to scala. No problems. But in pyspark, this fails
consistently:
ctx = SQLContext(sc)
pf = ctx.parquetFile(...)
rdd = pf.map(lambda x: x)
crdd = ctx.inferSchema(rdd)
crdd.saveAsParquetFile(...)
If I do
rdd = sc.parallelize([hello, world])
rdd.saveAsTextFile(...)
It works.