Once you convert your data to a dataframe (look at spark-csv), try df.write.partitionBy("yyyy", "mm").save("...").
On Thu, Oct 1, 2015 at 4:11 PM, haridass saisriram < haridass.saisri...@gmail.com> wrote: > Hi, > > I am trying to find a simple example to read a data file on HDFS. The > file has the following format > a , b , c ,yyyy,mm > a1,b1,c1,2015,09 > a2,b2,c2,2014,08 > > > I would like to read this file and store it in HDFS partitioned by year > and month. Something like this > /path/to/hdfs/yyyy/mm > > I want to specify the "/path/to/hdfs/" and yyyy/mm should be populated > automatically based on those columns. Could some one point me in the right > direction > > Thank you, > Sri Ram > >