Hi,
We have been using Spark Kafka streaming for real time processing with
success. The scale of this stream has been increasing with data growth, and
we have been able to scale up by adding more brokers to the Kafka cluster,
adding more partitions to the topic, and adding more executors to the sp
lets say i have 2 dataframe jobs that write to /somedir/a=1 and
somedir/a=2. these can run at same time without issues.
but now i get excited about dynamic partitioning. so i add "a" as a column
to my 2 dataframes, set the option partitionOverwriteMode=dynamic, add
partitionBy("a": _*) to the writ
Hi All,
Does Spark SQL has match_recognize? I am not sure why CEP seems to be
neglected I believe it is one of the most useful concepts in the Financial
applications!
Is there a plan to support it?
Thanks!
Thanks for the answer.
Are you saying that your jobs.py is actually a zip file with all your code
in (just like I do)? That is the part it is failing on my side. If I add my
archive with .py extension it is ignored. I have to rename it to zip in
order to have it avaialbe on my executors.
So is it
Hi,
The solution we have is to make a generic spark submit python file (driver.py).
This is just a main method which takes a single parameter- the module
containing the app you want to run. The main method itself just dynamically
loads the module and executes some well know method on it (we us
Following up to my previous message, it turns out the issue is a mix
between packaging and spark launcher mechanics. Everything can be
summarized as follow
- My app is packaged as 'fat-jar' alike. I'm using zipapp to prepare the
self executable archive (with all the needed descriptors)
-