date:20220712

Re: reading each JSON file from dataframe...

2022-07-12 Thread Muthu Jayakumar

Hello Ayan, Thank you for the suggestion. But, I would lose correlation of the JSON file with the other identifier fields. Also, if there are too many files, will it be an issue? Plus, I may not have the same schema across all the files. Hello Enrico, >how does RDD's mapPartitions make a

Spark streaming pending mircobatches queue max length

2022-07-12 Thread Anil Dasari

Hello, Spark is adding entry to pending microbatches queue at periodic batch interval. Is there config to set the max size for pending microbatches queue ? Thanks

Re: reading each JSON file from dataframe...

2022-07-12 Thread ayan guha

Another option is: 1. collect the dataframe with file path 2. create a list of paths 3. create a new dataframe with spark.read.json and pass the list of path This will save you lots of headache Ayan On Wed, Jul 13, 2022 at 7:35 AM Enrico Minack wrote: > Hi, > > how does RDD's mapPartitions

Re: reading each JSON file from dataframe...

2022-07-12 Thread Enrico Minack

Hi, how does RDD's mapPartitions make a difference regarding 1. and 2. compared to Dataset's mapPartitions / map function? Enrico Am 12.07.22 um 22:13 schrieb Muthu Jayakumar: Hello Enrico, Thanks for the reply. I found that I would have to use `mapPartitions` API of RDD to perform this

Re: reading each JSON file from dataframe...

2022-07-12 Thread Muthu Jayakumar

Hello Enrico, Thanks for the reply. I found that I would have to use `mapPartitions` API of RDD to perform this safely as I have to 1. Read each file from GCS using HDFS FileSystem API. 2. Parse each JSON record in a safe manner. For (1) to work, I do have to broadcast HadoopConfiguration from

[Spark][Core] Resource Allocation

2022-07-12 Thread Amin Borjian

I have some problems that I am looking for if there is no solution for them (due to the current implementation) or if there is a way and I was not aware of it. 1) Currently, we can enable and configure dynamic resource allocation based on below documentation.

Re: reading each JSON file from dataframe...

Spark streaming pending mircobatches queue max length

Re: reading each JSON file from dataframe...

Re: reading each JSON file from dataframe...

Re: reading each JSON file from dataframe...

[Spark][Core] Resource Allocation

6 matches

Site Navigation

Mail list logo

Footer information