Checkpoints not cleaned using Spark streaming + watermarking + kafka

2017-09-21 Thread MathieuP
Hi Spark Users ! :) I come to you with a question about checkpoints. I have a streaming application that consumes and produces to Kafka. The computation requires a window and watermarking. Since this is a streaming application with a Kafka output, a checkpoint is expected. The application runs

Re: How to pass sparkSession from driver to executor

2017-09-21 Thread ayan guha
The point here is - spark session is not available in executors. So, you have to use appropriate storage clients. On Fri, Sep 22, 2017 at 1:44 AM, lucas.g...@gmail.com wrote: > I'm not sure what you're doing. But I have in the past used spark to > consume a manifest file

Re: How to pass sparkSession from driver to executor

2017-09-21 Thread lucas.g...@gmail.com
I'm not sure what you're doing. But I have in the past used spark to consume a manifest file and then execute a .mapPartition on the result like this: def map_key_to_event(s3_events_data_lake): def _map_key_to_event(event_key_list, s3_client=test_stub): print("Events in list")

Re: How to pass sparkSession from driver to executor

2017-09-21 Thread Weichen Xu
Spark do not allow executor code using `sparkSession`. But I think you can move all json files to one directory, and them run: ``` spark.read.json("/path/to/jsonFileDir") ``` But if you want to get filename at the same time, you can use ```

How to pass sparkSession from driver to executor

2017-09-21 Thread Chackravarthy Esakkimuthu
Hi, I want to know how to pass sparkSession from driver to executor. I have a spark program (batch job) which does following, # val spark = SparkSession.builder().appName("SampleJob").config( "spark.master", "local") .getOrCreate() val df = this is dataframe which has list of

Re: for loops in pyspark

2017-09-21 Thread Alexander Czech
That is not really possible the whole project is rather large and I would not like to release it before I published the results. But if there is no know issues with doing spark in a for loop I will look into other possibilities for memory leaks. Thanks On 20 Sep 2017 15:22, "Weichen Xu"