Wow, that's really great to know. Thank you so much Adam. Do you know when the 3.1 release is scheduled?
Regards, Felix K Jose On Fri, Jan 29, 2021 at 12:35 PM Adam Binford <adam...@gmail.com> wrote: > As of 3.0, the only way to do it is something that will recreate the whole > struct: > df.withColumn('timingPeriod', > f.struct(f.col('timingPeriod.start').cast('timestamp').alias('start'), > f.col('timingPeriod.end').cast('timestamp').alias('end'))) > > There's a new method coming in 3.1 on the column class called withField > which was designed for this purpose. I backported it to my personal 3.0 > build because of how useful it is. It works something like: > df.withColumn('timingPeriod', f.col('timingPeriod').withField('start', > f.col('timingPeriod.start').cast('timestamp')).withField('end', > f.col('timingPeriod.end'))) > > And it works on multiple levels of nesting which is nice. > > On Fri, Jan 29, 2021 at 11:32 AM Felix Kizhakkel Jose < > felixkizhakkelj...@gmail.com> wrote: > >> Hello All, >> >> I am using pyspark structured streaming and I am getting timestamp fields >> as plain long (milliseconds), so I have to modify these fields into a >> timestamp type >> >> a sample json object object: >> >> { >> "id":{ >> "value": "f40b2e22-4003-4d90-afd3-557bc013b05e", >> "type": "UUID", >> "system": "Test" >> }, >> "status": "Active", >> "timingPeriod": { >> "startDateTime": 1611859271516, >> "endDateTime": null >> }, >> "eventDateTime": 1611859272122, >> "isPrimary": true, >> } >> >> Here I want to convert "eventDateTime" and "startDateTime" and >> "endDateTime" as timestamp types >> >> So I have done following, >> >> def transform_date_col(date_col): >> return f.when(f.col(date_col).isNotNull(), f.col(date_col) / 1000) >> >> df.withColumn( >> "eventDateTime", >> transform_date_col("eventDateTime").cast("timestamp")).withColumn( >> "timingPeriod.start", >> transform_date_col("timingPeriod.start").cast("timestamp")).withColumn( >> "timingPeriod.end", >> transform_date_col("timingPeriod.end").cast("timestamp")) >> >> the timingPeriod fields are not a struct anymore rather they become two >> different fields with names "timingPeriod.start", "timingPeriod.end". >> >> How can I get them as a struct as before? >> Is there a generic way I can modify a single/multiple properties of >> nested structs? >> >> I have hundreds of entities where the long needs to convert to timestamp, >> so a generic implementation will help my data ingestion pipeline a lot. >> >> Regards, >> Felix K Jose >> >> > > -- > Adam Binford >