Hi,

Right now, if any code uses DataFrame/Dataset, I need a test setup that
brings up a local master as in this article
<http://blog.cloudera.com/blog/2015/09/making-apache-spark-testing-easy-with-spark-testing-base/>
.

That's a lot of overhead for unit testing and the tests can't run in
parallel, so testing is slow -- this is more like what I'd call an
integration test.

Do people have any tricks to get around this? Maybe using spy mocks on fake
DataFrame/Datasets?

Anyone know if there are plans to make more traditional unit testing
possible with Spark SQL, perhaps with a stripped down in-memory
implementation? (I admit this does seem quite hard since there's so much
functionality in these classes!)

Thanks!

- Everett

Reply via email to