On Sun, Aug 21, 2016 at 3:08 AM, Bedrytski Aliaksandr <sp...@bedryt.ski> wrote:
> Hi, > > we share the same spark/hive context between tests (executed in > parallel), so the main problem is that the temporary tables are > overwritten each time they are created, this may create race conditions > as these tempTables may be seen as global mutable shared state. > > So each time we create a temporary table, we add an unique, incremented, > thread safe id (AtomicInteger) to its name so that there are only > specific, non-shared temporary tables used for a test. > Makes sense. But when you say you're sharing the same spark/hive context between tests, I'm assuming that's between the same tests within one test class, but you're not sharing across test classes (which a build tool like Maven or Gradle might have executed in separate JVMs). Is that right? > > -- > Bedrytski Aliaksandr > sp...@bedryt.ski > > > > On Sat, Aug 20, 2016, at 01:25, Everett Anderson wrote: > Hi! > > Just following up on this -- > > When people talk about a shared session/context for testing like this, > I assume it's still within one test class. So it's still the case that > if you have a lot of test classes that test Spark-related things, you > must configure your build system to not run in them in parallel. > You'll get the benefit of not creating and tearing down a Spark > session/context between test cases with a test class, though. > > Is that right? > > Or have people figured out a way to have sbt (or Maven/Gradle/etc) > share Spark sessions/contexts across integration tests in a safe way? > > > On Mon, Aug 1, 2016 at 3:23 PM, Holden Karau > <hol...@pigscanfly.ca> wrote: > Thats a good point - there is an open issue for spark-testing-base to > support this shared sparksession approach - but I haven't had the > time ( https://github.com/holdenk/spark-testing-base/issues/123 ). > I'll try and include this in the next release :) > > On Mon, Aug 1, 2016 at 9:22 AM, Koert Kuipers > <ko...@tresata.com> wrote: > we share a single single sparksession across tests, and they can run > in parallel. is pretty fast > > On Mon, Aug 1, 2016 at 12:02 PM, Everett Anderson > <ever...@nuna.com.invalid> wrote: > Hi, > > Right now, if any code uses DataFrame/Dataset, I need a test setup > that brings up a local master as in this article[1]. > > That's a lot of overhead for unit testing and the tests can't run > in parallel, so testing is slow -- this is more like what I'd call > an integration test. > > Do people have any tricks to get around this? Maybe using spy mocks > on fake DataFrame/Datasets? > > Anyone know if there are plans to make more traditional unit > testing possible with Spark SQL, perhaps with a stripped down in- > memory implementation? (I admit this does seem quite hard since > there's so much functionality in these classes!) > > Thanks! > > > - Everett > > > -- > Cell : 425-233-8271 > Twitter: https://twitter.com/holdenkarau > >