Cristian created SPARK-11879: -------------------------------- Summary: Checkpoint support for DataFrame Key: SPARK-11879 URL: https://issues.apache.org/jira/browse/SPARK-11879 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.5.2 Reporter: Cristian
Explicit support for checkpointing DataFrames is need to be able to truncate lineages, prune the query plan (particularly the logical plan) and transparent failure recovery. While for recovery saving to a Parquet file may be sufficient, actually using that as a checkpoint (and truncating the lineage), requires reading the files back. This is required to be able to use DataFrames in iterative scenarios like Streaming and ML, as well as for avoiding expensive re-computations in case of executor failure when executing a complex chain of queries on very large datasets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org