subject:"smarter way to \"forget\" DataFrame definition and stick to its values"

Re: smarter way to "forget" DataFrame definition and stick to its values

2018-05-02 Thread Lalwani, Jayesh

There is a trade off involved here. If you have a Spark application with a complicated logical graph, you can either cache data at certain points in the DAG, or you don’t cache data. The side effect of caching data is higher memory usage. The side effect of not caching data is higher CPU usage

smarter way to "forget" DataFrame definition and stick to its values

2018-05-01 Thread Valery Khamenya

hi all a short example before the long story: var accumulatedDataFrame = ... // initialize for (i <- 1 to 100) { val myTinyNewData = ... // my slowly calculated new data portion in tiny amounts accumulatedDataFrame = accumulatedDataFrame.union(myTinyNewData) // how to stick