Cristian created SPARK-11879:
--------------------------------

             Summary: Checkpoint support for DataFrame
                 Key: SPARK-11879
                 URL: https://issues.apache.org/jira/browse/SPARK-11879
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 1.5.2
            Reporter: Cristian


Explicit support for checkpointing DataFrames is need to be able to truncate 
lineages, prune the query plan (particularly the logical plan) and transparent 
failure recovery.

While for recovery saving to a Parquet file may be sufficient, actually using 
that as a checkpoint (and truncating the lineage), requires reading the files 
back.

This is required to be able to use DataFrames in iterative scenarios like 
Streaming and ML, as well as for avoiding expensive re-computations in case of 
executor failure when executing a complex chain of queries on very large 
datasets. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to