I would suggest to use Flume, if possible, as it has in built HDFS log rolling capabilities....
On Mon, Jun 26, 2017 at 1:09 PM, Naveen Madhire <vmadh...@umail.iu.edu> wrote: > Hi, > > I am using spark streaming with 1 minute duration to read data from kafka > topic, apply transformations and persist into HDFS. > > The application is creating a new directory every 1 minute with many > partition files(= nbr of partitions). What parameter should I need to > change/configure to persist and create a HDFS directory say *every 30 > minutes* instead of duration of the spark streaming application? > > > Any help would be appreciated. > > Thanks, > Naveen > > > -- Best Regards, Ayan Guha