If you need to do this in 2.x, this library does the trick:
https://github.com/fqaiser94/mse

On Fri, Jan 29, 2021 at 12:15 PM Adam Binford <adam...@gmail.com> wrote:

> I think they're voting on the next release candidate starting sometime
> next week. So hopefully barring any other major hurdles within the next few
> weeks.
>
> On Fri, Jan 29, 2021, 1:01 PM Felix Kizhakkel Jose <
> felixkizhakkelj...@gmail.com> wrote:
>
>> Wow, that's really great to know. Thank you so much Adam. Do you know
>> when the 3.1 release is scheduled?
>>
>> Regards,
>> Felix K Jose
>>
>> On Fri, Jan 29, 2021 at 12:35 PM Adam Binford <adam...@gmail.com> wrote:
>>
>>> As of 3.0, the only way to do it is something that will recreate the
>>> whole struct:
>>> df.withColumn('timingPeriod',
>>> f.struct(f.col('timingPeriod.start').cast('timestamp').alias('start'),
>>> f.col('timingPeriod.end').cast('timestamp').alias('end')))
>>>
>>> There's a new method coming in 3.1 on the column class called withField
>>> which was designed for this purpose. I backported it to my personal 3.0
>>> build because of how useful it is. It works something like:
>>> df.withColumn('timingPeriod', f.col('timingPeriod').withField('start',
>>> f.col('timingPeriod.start').cast('timestamp')).withField('end',
>>> f.col('timingPeriod.end')))
>>>
>>> And it works on multiple levels of nesting which is nice.
>>>
>>> On Fri, Jan 29, 2021 at 11:32 AM Felix Kizhakkel Jose <
>>> felixkizhakkelj...@gmail.com> wrote:
>>>
>>>> Hello All,
>>>>
>>>> I am using pyspark structured streaming and I am getting timestamp
>>>> fields as plain long (milliseconds), so I have to modify these fields into
>>>> a timestamp type
>>>>
>>>> a sample json object object:
>>>>
>>>> {
>>>>   "id":{
>>>>       "value": "f40b2e22-4003-4d90-afd3-557bc013b05e",
>>>>       "type": "UUID",
>>>>       "system": "Test"
>>>>     },
>>>>   "status": "Active",
>>>>   "timingPeriod": {
>>>>     "startDateTime": 1611859271516,
>>>>     "endDateTime": null
>>>>   },
>>>>   "eventDateTime": 1611859272122,
>>>>   "isPrimary": true,
>>>> }
>>>>
>>>>   Here I want to convert "eventDateTime" and "startDateTime" and
>>>> "endDateTime" as timestamp types
>>>>
>>>> So I have done following,
>>>>
>>>> def transform_date_col(date_col):
>>>>     return f.when(f.col(date_col).isNotNull(), f.col(date_col) / 1000)
>>>>
>>>> df.withColumn(
>>>>     "eventDateTime", 
>>>> transform_date_col("eventDateTime").cast("timestamp")).withColumn(
>>>>     "timingPeriod.start", 
>>>> transform_date_col("timingPeriod.start").cast("timestamp")).withColumn(
>>>>     "timingPeriod.end", 
>>>> transform_date_col("timingPeriod.end").cast("timestamp"))
>>>>
>>>> the timingPeriod fields are not a struct anymore rather they become two
>>>> different fields with names "timingPeriod.start", "timingPeriod.end".
>>>>
>>>> How can I get them as a struct as before?
>>>> Is there a generic way I can modify a single/multiple properties of
>>>> nested structs?
>>>>
>>>> I have hundreds of entities where the long needs to convert to
>>>> timestamp, so a generic implementation will help my data ingestion pipeline
>>>> a lot.
>>>>
>>>> Regards,
>>>> Felix K Jose
>>>>
>>>>
>>>
>>> --
>>> Adam Binford
>>>
>>

Reply via email to