Good question. Write Ahead Logs are used to do both - write data and write metadata when needed. When data wal is enabled using the conf spark.streaming.receiver.writeAheadLog.enable, data received by receivers are written to the data WAL by the executors. In some cases, like Direct Kafka and Kinesis, the record identifies are written to another metadata WAL by the driver. The metadata WAL is always enabled.
All WALs take care of their cleanup. Spark Streaming knows when things can be cleaned up (based on what window ops, etc are used in your program). So you dont have to worry about cleaning up. On Fri, Nov 20, 2015 at 12:26 PM, kali.tumm...@gmail.com < kali.tumm...@gmail.com> wrote: > Hi All, > > If write ahead logs are enabled in spark streaming does all the received > data gets written to HDFS path ? or it only writes the metadata. > How does clean up works , does HDFS path gets bigger and bigger up > everyday > do I need to write an clean up job to delete data from write ahead logs > folder ? > what actually does write ahead log folder has ? > > Thanks > Sri > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Does-spark-streaming-write-ahead-log-writes-all-received-data-to-HDFS-tp25439.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >