Hi, Right now, if any code uses DataFrame/Dataset, I need a test setup that brings up a local master as in this article <http://blog.cloudera.com/blog/2015/09/making-apache-spark-testing-easy-with-spark-testing-base/> .
That's a lot of overhead for unit testing and the tests can't run in parallel, so testing is slow -- this is more like what I'd call an integration test. Do people have any tricks to get around this? Maybe using spy mocks on fake DataFrame/Datasets? Anyone know if there are plans to make more traditional unit testing possible with Spark SQL, perhaps with a stripped down in-memory implementation? (I admit this does seem quite hard since there's so much functionality in these classes!) Thanks! - Everett