Simplest way would be issuing a os.system with HDFS rm command from driver, assuming it has hdfs connectivity, like a gateway node. Executors will have nothing to do with it. On 12 Jun 2015 08:57, "Siegfried Bilstein" <sbilst...@gmail.com> wrote:
> I've seen plenty of examples for creating HDFS files from pyspark but > haven't been able to figure out how to delete files from pyspark. Is there > an API I am missing for filesystem management? Or should I be including the > HDFS python modules? > > Thanks, > Siegfried >