subject:"\[GitHub\] \[spark\] cloud\-fan commented on issue #24991\: \[SPARK\-28188\] Materialize Dataframe API"

[GitHub] [spark] cloud-fan commented on issue #24991: [SPARK-28188] Materialize Dataframe API

2019-07-16 Thread GitBox

cloud-fan commented on issue #24991: [SPARK-28188] Materialize Dataframe API URL: https://github.com/apache/spark/pull/24991#issuecomment-511712741 I've spent more time understanding the use case, and think table cache should be a better choice here 1. disk vs memory: you can set the sto

[GitHub] [spark] cloud-fan commented on issue #24991: [SPARK-28188] Materialize Dataframe API

2019-07-14 Thread GitBox

cloud-fan commented on issue #24991: [SPARK-28188] Materialize Dataframe API URL: https://github.com/apache/spark/pull/24991#issuecomment-511269953 It's simple to write to noop sink: `df.write.format("noop").save`, why do we need this extra public API? -

[GitHub] [spark] cloud-fan commented on issue #24991: [SPARK-28188] Materialize Dataframe API

2019-07-04 Thread GitBox

cloud-fan commented on issue #24991: [SPARK-28188] Materialize Dataframe API URL: https://github.com/apache/spark/pull/24991#issuecomment-508389503 +1 for the no-op sink, I think it should be the same as the no-op `runJob` here. -