Re: How the data is distributed

2022-06-06 Thread Sean Owen
Data is not distributed to executors by anything. If you are processing data with Spark. Spark spawns tasks on executors to read chunks of data from wherever they are (S3, HDFS, etc). On Mon, Jun 6, 2022 at 4:07 PM Sid wrote: > Hi experts, > > > When we load any file, I know that based on the i

Re: How the data is distributed

2022-06-06 Thread Peyman Mohajerian
Later. On Mon, Jun 6, 2022 at 2:07 PM Sid wrote: > Hi experts, > > > When we load any file, I know that based on the information in the spark > session about the executors location, status and etc , the data is > distributed among the worker nodes and executors. > > But I have one doubt. Is the

How the data is distributed

2022-06-06 Thread Sid
Hi experts, When we load any file, I know that based on the information in the spark session about the executors location, status and etc , the data is distributed among the worker nodes and executors. But I have one doubt. Is the data initially loaded on the driver and then it is distributed or

Structured streaming with protobuf proto3 schema registry

2022-06-06 Thread Kiran Biswal
Hello Experts Has anyone been able to use schema registry for spark structured streaming where input data is protobuf proto3 If input is Avro, I believe schema registry is doable. Wondering about protobuf schema registry kafka (protobuf proto3) -> schema published to registry-> spark structured

Re: How to convert a Dataset to a Dataset?

2022-06-06 Thread Stelios Philippou
Hi All, Simple in Java as well. You can get the Dataset Directly Dataset encodedString = df.select("Column") .where("") .as(Encoders.STRING()) .toDF(); On Mon, 6 Jun 2022 at 15:26, Christophe Préaud < christophe.pre...@kelkoogroup.com> wrote: > Hi Marc, > > I'm not much familiar with Spark on J

Re: How to convert a Dataset to a Dataset?

2022-06-06 Thread Christophe Préaud
Hi Marc, I'm not much familiar with Spark on Java, but according to the doc , it should be: Encoder stringEncoder = Encoders.STRING(); dataset.as(stringEncoder); For the record, it is much simpler in Scala: datase