Re: rdd.saveAsTextFile blows up

2014-07-25 Thread Akhil Das
Most likely you are closing the connection with HDFS. Can you paste the piece of code that you are executing? We were having similar problem when we closed the FileSystem object in our code. Thanks Best Regards On Thu, Jul 24, 2014 at 11:00 PM, Eric Friedman eric.d.fried...@gmail.com wrote:

Re: rdd.saveAsTextFile blows up

2014-07-25 Thread Eric Friedman
I ported the same code to scala. No problems. But in pyspark, this fails consistently: ctx = SQLContext(sc) pf = ctx.parquetFile(...) rdd = pf.map(lambda x: x) crdd = ctx.inferSchema(rdd) crdd.saveAsParquetFile(...) If I do rdd = sc.parallelize([hello, world]) rdd.saveAsTextFile(...) It works.