Hi Spark Users ! :)
I come to you with a question about checkpoints.
I have a streaming application that consumes and produces to Kafka.
The computation requires a window and watermarking.
Since this is a streaming application with a Kafka output, a checkpoint is
expected.
The application runs
The point here is - spark session is not available in executors. So, you
have to use appropriate storage clients.
On Fri, Sep 22, 2017 at 1:44 AM, lucas.g...@gmail.com
wrote:
> I'm not sure what you're doing. But I have in the past used spark to
> consume a manifest file
I'm not sure what you're doing. But I have in the past used spark to
consume a manifest file and then execute a .mapPartition on the result like
this:
def map_key_to_event(s3_events_data_lake):
def _map_key_to_event(event_key_list, s3_client=test_stub):
print("Events in list")
Spark do not allow executor code using `sparkSession`.
But I think you can move all json files to one directory, and them run:
```
spark.read.json("/path/to/jsonFileDir")
```
But if you want to get filename at the same time, you can use
```
Hi,
I want to know how to pass sparkSession from driver to executor.
I have a spark program (batch job) which does following,
#
val spark = SparkSession.builder().appName("SampleJob").config(
"spark.master", "local") .getOrCreate()
val df = this is dataframe which has list of
That is not really possible the whole project is rather large and I would
not like to release it before I published the results.
But if there is no know issues with doing spark in a for loop I will look
into other possibilities for memory leaks.
Thanks
On 20 Sep 2017 15:22, "Weichen Xu"