Re: Structured Streaming: mapGroupsWithState UDT serialization does not work

2020-02-29 Thread Bryan Jeffrey
che/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/types/SQLUserDefinedType.java>` >>>> on the class definition. You dont seem to have done it, maybe thats the >>>> reason? >>>> >>>> I would debug by printing the values in the serialize/d

Re: Structured Streaming: mapGroupsWithState UDT serialization does not work

2020-02-29 Thread Jungtaek Lim
own to fail. >>> >>> TD >>> >>> On Fri, Feb 28, 2020 at 2:45 PM Bryan Jeffrey >>> wrote: >>> >>>> Tathagata, >>>> >>>> The difference is more than hours off. In this instance it's different >>>&g

Re: Structured Streaming: mapGroupsWithState UDT serialization does not work

2020-02-29 Thread Jungtaek Lim
of years (and other >>> smaller durations). >>> >>> We've considered moving to storage as longs, but this makes code much >>> less readable and harder to maintain. The udt serialization bug also causes >>> issues outside of stateful streaming, as when e

Fwd: Structured Streaming: mapGroupsWithState UDT serialization does not work

2020-02-28 Thread Bryan Jeffrey
s when executing a simple group by. >> >> Regards, >> >> Bryan Jeffrey >> >> Get Outlook for Android <https://aka.ms/ghei36> >> >> -- >> *From:* Tathagata Das >> *Sent:* Friday, February 28, 2020 4:56:07 PM

Re: Structured Streaming: mapGroupsWithState UDT serialization does not work

2020-02-28 Thread Bryan Jeffrey
lto:bryan.jeff...@gmail.com>> Cc: user mailto:user@spark.apache.org>> Subject: Re: Structured Streaming: mapGroupsWithState UDT serialization does not work You are deserializing by explicitly specifying UTC timezone, but when serializing you are not specifying it. Maybe that is reason? Als

Re: Structured Streaming: mapGroupsWithState UDT serialization does not work

2020-02-28 Thread Tathagata Das
as when executing a simple group by. > > Regards, > > Bryan Jeffrey > > Get Outlook for Android <https://aka.ms/ghei36> > > -- > *From:* Tathagata Das > *Sent:* Friday, February 28, 2020 4:56:07 PM > *To:* Bryan Jeffrey > *Cc:* user &

Re: Structured Streaming: mapGroupsWithState UDT serialization does not work

2020-02-28 Thread Bryan Jeffrey
To: Bryan Jeffrey Cc: user Subject: Re: Structured Streaming: mapGroupsWithState UDT serialization does not work You are deserializing by explicitly specifying UTC timezone, but when serializing you are not specifying it. Maybe that is reason? Also, if you can encode it using just long, then I

Re: Structured Streaming: mapGroupsWithState UDT serialization does not work

2020-02-28 Thread Tathagata Das
You are deserializing by explicitly specifying UTC timezone, but when serializing you are not specifying it. Maybe that is reason? Also, if you can encode it using just long, then I recommend just saving the value as long and eliminating some of the serialization overheads. Spark will probably bet

Structured Streaming: mapGroupsWithState UDT serialization does not work

2020-02-28 Thread Bryan Jeffrey
Hello. I'm running Scala 2.11 w/ Spark 2.3.0. I've encountered a problem with mapGroupsWithState, and was wondering if anyone had insight. We use Joda time in a number of data structures, and so we've generated a custom serializer for Joda. This works well in most dataset/dataframe structured s