Spark structured streaming leftOuter join not working as I expect

2019-06-04 Thread Joe Ammann
Hi all sorry, tl;dr I'm on my first Python Spark structured streaming app, in the end joining messages from ~10 different Kafka topics. I've recently upgraded to Spark 2.4.3, which has resolved all my issues with the time handling (watermarks, join windows) I had before with Spark 2.3.2. My c

Re: Upsert for hive tables

2019-06-04 Thread tkrol
Hi Magnus, Yes, I was thinking also about partitioning approach. And I think this is the best solution in this type of scenario. Also my scenario is relevant to your last paragraph, the dates which are coming are very random. I can get updated from 2012 and from 2019. Therefore, this strategy mi

Spark Streaming: Task not distributed

2019-06-04 Thread Pipster Neko
Hi, I am curious how records are being put to task, since, as you may see on the photo below, there's 1 specific executor that contains more task than the other. The setup is this: - Spark version 2.3.1 - Spark streaming job runs on Spark Standalone with following configuration: -

installation of spark

2019-06-04 Thread ya
Dear list, I am very new to spark, and I am having trouble installing it on my mac. I have following questions, please give me some guidance. Thank you very much. 1. How many and what software should I install before installing spark? I have been searching online, people discussing their expe

Re: installation of spark

2019-06-04 Thread Jack Kolokasis
Hello,     at first you will need to make sure that JAVA is installed, or install it otherwise. Then install scala and a build tool (sbt or maven). In my point of view, IntelliJ IDEA is a good option to create your Spark applications.  At the end you have to install a distributed file system