Deleting HDFS files from Pyspark

2015-06-11 Thread Siegfried Bilstein
I've seen plenty of examples for creating HDFS files from pyspark but haven't been able to figure out how to delete files from pyspark. Is there an API I am missing for filesystem management? Or should I be including the HDFS python modules? Thanks, Siegfried

Re: Deleting HDFS files from Pyspark

2015-06-11 Thread ayan guha
Simplest way would be issuing a os.system with HDFS rm command from driver, assuming it has hdfs connectivity, like a gateway node. Executors will have nothing to do with it. On 12 Jun 2015 08:57, Siegfried Bilstein sbilst...@gmail.com wrote: I've seen plenty of examples for creating HDFS files