Scaling Kafka Streaming to Thousands of Partitions

2019-05-25 Thread Charles Chao
Hi, We have been using Spark Kafka streaming for real time processing with success. The scale of this stream has been increasing with data growth, and we have been able to scale up by adding more brokers to the Kafka cluster, adding more partitions to the topic, and adding more executors to the sp

conflict with multiple jobs writing to different partitions but same baseDir

2019-05-25 Thread Koert Kuipers
lets say i have 2 dataframe jobs that write to /somedir/a=1 and somedir/a=2. these can run at same time without issues. but now i get excited about dynamic partitioning. so i add "a" as a column to my 2 dataframes, set the option partitionOverwriteMode=dynamic, add partitionBy("a": _*) to the writ

Does Spark SQL has match_recognize?

2019-05-25 Thread kant kodali
Hi All, Does Spark SQL has match_recognize? I am not sure why CEP seems to be neglected I believe it is one of the most useful concepts in the Financial applications! Is there a plan to support it? Thanks!

Re: Spark standalone and Pandas UDF from custom archive

2019-05-25 Thread Riccardo Ferrari
Thanks for the answer. Are you saying that your jobs.py is actually a zip file with all your code in (just like I do)? That is the part it is failing on my side. If I add my archive with .py extension it is ignored. I have to rename it to zip in order to have it avaialbe on my executors. So is it

Re: Spark standalone and Pandas UDF from custom archive

2019-05-25 Thread chris
Hi, The solution we have is to make a generic spark submit python file (driver.py). This is just a main method which takes a single parameter- the module containing the app you want to run. The main method itself just dynamically loads the module and executes some well know method on it (we us

Re: Spark standalone and Pandas UDF from custom archive

2019-05-25 Thread Riccardo Ferrari
Following up to my previous message, it turns out the issue is a mix between packaging and spark launcher mechanics. Everything can be summarized as follow - My app is packaged as 'fat-jar' alike. I'm using zipapp to prepare the self executable archive (with all the needed descriptors) -