If you need to do this in 2.x, this library does the trick: https://github.com/fqaiser94/mse
On Fri, Jan 29, 2021 at 12:15 PM Adam Binford <adam...@gmail.com> wrote: > I think they're voting on the next release candidate starting sometime > next week. So hopefully barring any other major hurdles within the next few > weeks. > > On Fri, Jan 29, 2021, 1:01 PM Felix Kizhakkel Jose < > felixkizhakkelj...@gmail.com> wrote: > >> Wow, that's really great to know. Thank you so much Adam. Do you know >> when the 3.1 release is scheduled? >> >> Regards, >> Felix K Jose >> >> On Fri, Jan 29, 2021 at 12:35 PM Adam Binford <adam...@gmail.com> wrote: >> >>> As of 3.0, the only way to do it is something that will recreate the >>> whole struct: >>> df.withColumn('timingPeriod', >>> f.struct(f.col('timingPeriod.start').cast('timestamp').alias('start'), >>> f.col('timingPeriod.end').cast('timestamp').alias('end'))) >>> >>> There's a new method coming in 3.1 on the column class called withField >>> which was designed for this purpose. I backported it to my personal 3.0 >>> build because of how useful it is. It works something like: >>> df.withColumn('timingPeriod', f.col('timingPeriod').withField('start', >>> f.col('timingPeriod.start').cast('timestamp')).withField('end', >>> f.col('timingPeriod.end'))) >>> >>> And it works on multiple levels of nesting which is nice. >>> >>> On Fri, Jan 29, 2021 at 11:32 AM Felix Kizhakkel Jose < >>> felixkizhakkelj...@gmail.com> wrote: >>> >>>> Hello All, >>>> >>>> I am using pyspark structured streaming and I am getting timestamp >>>> fields as plain long (milliseconds), so I have to modify these fields into >>>> a timestamp type >>>> >>>> a sample json object object: >>>> >>>> { >>>> "id":{ >>>> "value": "f40b2e22-4003-4d90-afd3-557bc013b05e", >>>> "type": "UUID", >>>> "system": "Test" >>>> }, >>>> "status": "Active", >>>> "timingPeriod": { >>>> "startDateTime": 1611859271516, >>>> "endDateTime": null >>>> }, >>>> "eventDateTime": 1611859272122, >>>> "isPrimary": true, >>>> } >>>> >>>> Here I want to convert "eventDateTime" and "startDateTime" and >>>> "endDateTime" as timestamp types >>>> >>>> So I have done following, >>>> >>>> def transform_date_col(date_col): >>>> return f.when(f.col(date_col).isNotNull(), f.col(date_col) / 1000) >>>> >>>> df.withColumn( >>>> "eventDateTime", >>>> transform_date_col("eventDateTime").cast("timestamp")).withColumn( >>>> "timingPeriod.start", >>>> transform_date_col("timingPeriod.start").cast("timestamp")).withColumn( >>>> "timingPeriod.end", >>>> transform_date_col("timingPeriod.end").cast("timestamp")) >>>> >>>> the timingPeriod fields are not a struct anymore rather they become two >>>> different fields with names "timingPeriod.start", "timingPeriod.end". >>>> >>>> How can I get them as a struct as before? >>>> Is there a generic way I can modify a single/multiple properties of >>>> nested structs? >>>> >>>> I have hundreds of entities where the long needs to convert to >>>> timestamp, so a generic implementation will help my data ingestion pipeline >>>> a lot. >>>> >>>> Regards, >>>> Felix K Jose >>>> >>>> >>> >>> -- >>> Adam Binford >>> >>