Wow, that's really great to know. Thank you so much Adam. Do you know when
the 3.1 release is scheduled?

Regards,
Felix K Jose

On Fri, Jan 29, 2021 at 12:35 PM Adam Binford <adam...@gmail.com> wrote:

> As of 3.0, the only way to do it is something that will recreate the whole
> struct:
> df.withColumn('timingPeriod',
> f.struct(f.col('timingPeriod.start').cast('timestamp').alias('start'),
> f.col('timingPeriod.end').cast('timestamp').alias('end')))
>
> There's a new method coming in 3.1 on the column class called withField
> which was designed for this purpose. I backported it to my personal 3.0
> build because of how useful it is. It works something like:
> df.withColumn('timingPeriod', f.col('timingPeriod').withField('start',
> f.col('timingPeriod.start').cast('timestamp')).withField('end',
> f.col('timingPeriod.end')))
>
> And it works on multiple levels of nesting which is nice.
>
> On Fri, Jan 29, 2021 at 11:32 AM Felix Kizhakkel Jose <
> felixkizhakkelj...@gmail.com> wrote:
>
>> Hello All,
>>
>> I am using pyspark structured streaming and I am getting timestamp fields
>> as plain long (milliseconds), so I have to modify these fields into a
>> timestamp type
>>
>> a sample json object object:
>>
>> {
>>   "id":{
>>       "value": "f40b2e22-4003-4d90-afd3-557bc013b05e",
>>       "type": "UUID",
>>       "system": "Test"
>>     },
>>   "status": "Active",
>>   "timingPeriod": {
>>     "startDateTime": 1611859271516,
>>     "endDateTime": null
>>   },
>>   "eventDateTime": 1611859272122,
>>   "isPrimary": true,
>> }
>>
>>   Here I want to convert "eventDateTime" and "startDateTime" and
>> "endDateTime" as timestamp types
>>
>> So I have done following,
>>
>> def transform_date_col(date_col):
>>     return f.when(f.col(date_col).isNotNull(), f.col(date_col) / 1000)
>>
>> df.withColumn(
>>     "eventDateTime", 
>> transform_date_col("eventDateTime").cast("timestamp")).withColumn(
>>     "timingPeriod.start", 
>> transform_date_col("timingPeriod.start").cast("timestamp")).withColumn(
>>     "timingPeriod.end", 
>> transform_date_col("timingPeriod.end").cast("timestamp"))
>>
>> the timingPeriod fields are not a struct anymore rather they become two
>> different fields with names "timingPeriod.start", "timingPeriod.end".
>>
>> How can I get them as a struct as before?
>> Is there a generic way I can modify a single/multiple properties of
>> nested structs?
>>
>> I have hundreds of entities where the long needs to convert to timestamp,
>> so a generic implementation will help my data ingestion pipeline a lot.
>>
>> Regards,
>> Felix K Jose
>>
>>
>
> --
> Adam Binford
>

Reply via email to