Hello All, I am using pyspark structured streaming and I am getting timestamp fields as plain long (milliseconds), so I have to modify these fields into a timestamp type
a sample json object object: { "id":{ "value": "f40b2e22-4003-4d90-afd3-557bc013b05e", "type": "UUID", "system": "Test" }, "status": "Active", "timingPeriod": { "startDateTime": 1611859271516, "endDateTime": null }, "eventDateTime": 1611859272122, "isPrimary": true, } Here I want to convert "eventDateTime" and "startDateTime" and "endDateTime" as timestamp types So I have done following, def transform_date_col(date_col): return f.when(f.col(date_col).isNotNull(), f.col(date_col) / 1000) df.withColumn( "eventDateTime", transform_date_col("eventDateTime").cast("timestamp")).withColumn( "timingPeriod.start", transform_date_col("timingPeriod.start").cast("timestamp")).withColumn( "timingPeriod.end", transform_date_col("timingPeriod.end").cast("timestamp")) the timingPeriod fields are not a struct anymore rather they become two different fields with names "timingPeriod.start", "timingPeriod.end". How can I get them as a struct as before? Is there a generic way I can modify a single/multiple properties of nested structs? I have hundreds of entities where the long needs to convert to timestamp, so a generic implementation will help my data ingestion pipeline a lot. Regards, Felix K Jose