Write custom JSON from DataFrame in PySpark

Marco Costantini Wed, 03 May 2023 21:37:58 -0700

Hello,

Let's say I have a very simple DataFrame, as below.


+---+----+
| id|datA|
+---+----+
|  1|  a1|
|  2|  a2|
|  3|  a3|
+---+----+

Let's say I have a requirement to write this to a bizarre JSON structure.
For example:

{
  "id": 1,
  "stuff": {
    "datA": "a1"
  }
}

How can I achieve this with PySpark? I have only seen the following:
- writing the DataFrame as-is (doesn't meet requirement)
- using a UDF (seems frowned upon)

What I have tried is to do this within a `foreach`. I have had some
success, but also some problems with other requirements (serializing other
things).

Any advice? Please and thank you,
Marco.

Write custom JSON from DataFrame in PySpark

Reply via email to