subject:"\[GitHub\] spark issue #13419\: \[SPARK\-15678\]\[SQL\] Not use cache on appends and overwrit..."

[GitHub] spark issue #13419: [SPARK-15678][SQL] Not use cache on appends and overwrit...

2016-06-10 Thread sameeragarwal

Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/13419 I ended up creating a small design doc describing the problem and presenting 2 possible solutions at https://docs.google.com/document/d/1h5SzfC5UsvIrRpeLNDKSMKrKJvohkkccFlXo-GBAwQQ/edit?ts=574

[GitHub] spark issue #13419: [SPARK-15678][SQL] Not use cache on appends and overwrit...

2016-06-03 Thread sameeragarwal

Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/13419 @tejasapatil if the nodes where the data was cached go down, the CacheManager should still consider that data as cached. In that case, the next time the data is accessed, the underlying RDD wi

[GitHub] spark issue #13419: [SPARK-15678][SQL] Not use cache on appends and overwrit...

2016-06-01 Thread tejasapatil

Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/13419 I guess that the caching is done over multiple nodes. If the data for a dataset is updated physically and some of the nodes where the data was cached go down, would the existing `cached` dataset