Hi Enrico, What a great answer. Thank you. Seems like I need to get comfortable with the 'struct' and then I will be golden. Thank you again, friend.
Marco. On Thu, May 4, 2023 at 3:00 AM Enrico Minack <enrico-min...@gmx.de> wrote: > Hi, > > You could rearrange the DataFrame so that writing the DataFrame as-is > produces your structure: > > df = spark.createDataFrame([(1, "a1"), (2, "a2"), (3, "a3")], "id int, > datA string") > +---+----+ > | id|datA| > +---+----+ > | 1| a1| > | 2| a2| > | 3| a3| > +---+----+ > > df2 = df.select(df.id, struct(df.datA).alias("stuff")) > root > |-- id: integer (nullable = true) > |-- stuff: struct (nullable = false) > | |-- datA: string (nullable = true) > +---+-----+ > | id|stuff| > +---+-----+ > | 1| {a1}| > | 2| {a2}| > | 3| {a3}| > +---+-----+ > > df2.write.json("data.json") > {"id":1,"stuff":{"datA":"a1"}} > {"id":2,"stuff":{"datA":"a2"}} > {"id":3,"stuff":{"datA":"a3"}} > > Looks pretty much like what you described. > > Enrico > > > Am 04.05.23 um 06:37 schrieb Marco Costantini: > > Hello, > > > > Let's say I have a very simple DataFrame, as below. > > > > +---+----+ > > | id|datA| > > +---+----+ > > | 1| a1| > > | 2| a2| > > | 3| a3| > > +---+----+ > > > > Let's say I have a requirement to write this to a bizarre JSON > > structure. For example: > > > > { > > "id": 1, > > "stuff": { > > "datA": "a1" > > } > > } > > > > How can I achieve this with PySpark? I have only seen the following: > > - writing the DataFrame as-is (doesn't meet requirement) > > - using a UDF (seems frowned upon) > > > > What I have tried is to do this within a `foreach`. I have had some > > success, but also some problems with other requirements (serializing > > other things). > > > > Any advice? Please and thank you, > > Marco. > > >