Re: Clojure interop with Spark

2020-07-10 Thread Dominic Parry
Another option is Apache Beam. We use it quite extensively. There are a few options for Clojure wrappers (we use datasplash), and beam has libraries for a number of popular languages. Kind Regards, Dom Parry On 10 Jul 2020, 08:22 +0200, Alex Ott , wrote: > From Spark perspective, I would

Re: Clojure interop with Spark

2020-07-10 Thread Alex Ott
>From Spark perspective, I would really advise to use Dataframe API as much as possible, including the Spark Structured Streaming instead of Spark Streaming - the main reason is more optimized execution of the code because of all optimizations that Catalyst is able to make. But I really don't see

Re: Clojure interop with Spark

2020-07-09 Thread Jeff Stokes
Hey Tim, We at Amperity have used Sparkling for our Clojure Spark interop in the past. After a few years of fighting, we eventually ended up with sparkplug ( https://github.com/amperity/sparkplug), which we now use to run all of our production Spark jobs. There is built in support for proper

Clojure interop with Spark

2020-07-09 Thread Tim Clemons
I'm putting together a big data system centered around using Spark Streaming for data ingest and Spark SQL for querying the stored data. I've been investigating what options there are for implementing Spark applications using Clojure. It's been close to a decade since sparkling or flambo