Re: How to explode array columns of a dataframe having the same length

2023-02-16 Thread Enrico Minack
You have to take each row and zip the lists, each element of the result becomes one new row. So turn write a method that turns   Row(List("A","B","null"), List("C","D","null"), List("E","null","null")) into   List(List("A","C","E"), List("B","D","null"), List("null","null","null")) and use flatm

Re: Running Spark on Kubernetes (GKE) - failing on spark-submit

2023-02-16 Thread Mich Talebzadeh
You can try this gsutil cp src/StructuredStream-on-gke.py gs://codes/ where you create a bucket on gcs called codes Then in you spark-submit do spark-submit --verbose \ --master k8s://https://$KUBERNETES_MASTER_IP:443 \ --deploy-mode cluster \ --name

Re: How to explode array columns of a dataframe having the same length

2023-02-16 Thread Navneet
I am not expert, may be try if this works: In order to achieve the desired output using the explode() method in Java, you can create a User-Defined Function (UDF) that zips the lists in each row and returns the resulting list. Here's an example implementation: typescript Copy code import org.apach

How can I set a value of Location with CustomDataSource ?

2023-02-16 Thread Zhuolin Ji
Hi all, I am sorry to bother you, I have a problem and I hope to get your help. I want to use spark to customize the data source based on ParquetDataSourceV2 class in spark v3.2.2, but I want to leave out the location field and then modify the table and partition path in the code. How can I do

Re: How to explode array columns of a dataframe having the same length

2023-02-16 Thread Bjørn Jørgensen
Use explode_outer() when rows have null values. tor. 16. feb. 2023 kl. 16:48 skrev Navneet : > I am not expert, may be try if this works: > In order to achieve the desired output using the explode() method in > Java, you can create a User-Defined Function (UDF) that zips the lists > in each row a

Re: How to explode array columns of a dataframe having the same length

2023-02-16 Thread sam smith
@Enrico Minack I used arrays_zip to merge values into one row, and then used toJSON() to export the data. @Bjørn explode_outer didn't yield the expected results. Thanks anyway. Le jeu. 16 févr. 2023 à 09:06, Enrico Minack a écrit : > You have to take each row and zip the lists, each element of

Re: How to explode array columns of a dataframe having the same length

2023-02-16 Thread Vikas Kumar
I think these 4 steps should help: Use zip Explode Withcolumn (getelement of array) Drop the array column Thanks On Thu, Feb 16, 2023, 2:18 PM sam smith wrote: > @Enrico Minack I used arrays_zip to merge values > into one row, and then used toJSON() to export the data. > @Bjørn explode_oute

[Spark Structured Streaming] Do spark structured streaming is support sink to AWS Kinesis currently?

2023-02-16 Thread hueiyuan su
*Component*: Spark Structured Streaming *Level*: Advanced *Scenario*: How-to *Problems Description* I would like to implement witeStream data to AWS Kinesis with Spark structured Streaming, but I do not find related connector jar can be used. I want to check whether fully

Re: [Spark Structured Streaming] Do spark structured streaming is support sink to AWS Kinesis currently?

2023-02-16 Thread Vikas Kumar
Doesn't directly answer your question but there are ways in scala and pyspark - See if this helps: https://repost.aws/questions/QUP_OJomilTO6oIgvK00VHEA/writing-data-to-kinesis-stream-from-py-spark On Thu, Feb 16, 2023, 8:27 PM hueiyuan su wrote: > *Component*: Spark Structured Streaming > *Leve