Good question.

Write Ahead Logs are used to do both - write data and write metadata when
needed. When data wal is enabled using the conf
spark.streaming.receiver.writeAheadLog.enable, data received by receivers
are written to the data WAL by the executors. In some cases, like Direct
Kafka and Kinesis, the record identifies are written to another metadata
WAL by the driver. The metadata WAL is always enabled.

All WALs take care of their cleanup. Spark Streaming knows when things can
be cleaned up (based on what window ops, etc are used in your program). So
you dont have to worry about cleaning up.

On Fri, Nov 20, 2015 at 12:26 PM, kali.tumm...@gmail.com <
kali.tumm...@gmail.com> wrote:

> Hi All,
>
> If write ahead logs are enabled in spark streaming does all the received
> data gets written to HDFS path   ? or it only writes the metadata.
> How does clean up works , does HDFS path gets bigger and bigger  up
> everyday
> do I need to write an clean up job to delete data from  write ahead logs
> folder ?
> what actually does write ahead log folder has ?
>
> Thanks
> Sri
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Does-spark-streaming-write-ahead-log-writes-all-received-data-to-HDFS-tp25439.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to