Hi, I have written the UDF for doing same in pyspark DataFrame since some of my dates are before unix standard time epoch of 1/1/1970. I have more than 250 columns and applying custom date_format UDF to more than 50 columns. I am getting OOM error and poor performance because of UDF.
What's your Data Size and how is the performance? Thanks, Bijay On Thu, Mar 24, 2016 at 10:19 AM, Mich Talebzadeh <mich.talebza...@gmail.com > wrote: > Minor correction UK date is dd/MM/yyyy > > scala> sql("select paymentdate, > TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(paymentdate,'dd/MM/yyyy'),'yyyy-MM-dd')) > AS newdate from tmp").first > res47: org.apache.spark.sql.Row = [10/02/2014,2014-02-10] > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 24 March 2016 at 17:09, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> Thanks everyone. Appreciated >> >> sql("select paymentdate, >> TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(paymentdate,'MM/dd/yyyy'),'yyyy-MM-dd')) >> from tmp").first >> res45: org.apache.spark.sql.Row = [10/02/2014,2014-10-02] >> >> Breaking a nut with sledgehammer :) >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> On 24 March 2016 at 17:03, Kasinathan, Prabhu <pkasinat...@paypal.com> >> wrote: >> >>> Can you try this one? >>> >>> spark-sql> select paymentdate, >>> TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(paymentdate,'MM/dd/yyyy'),'yyyy-MM-dd')) >>> from tmp; >>> 10/02/2014 2014-10-02 >>> spark-sql> >>> >>> >>> From: Tamas Szuromi <tamas.szur...@odigeo.com.INVALID> >>> Date: Thursday, March 24, 2016 at 9:35 AM >>> To: Mich Talebzadeh <mich.talebza...@gmail.com> >>> Cc: Ajay Chander <itsche...@gmail.com>, Tamas Szuromi < >>> tamas.szur...@odigeo.com.INVALID>, "user @spark" <user@spark.apache.org> >>> Subject: Re: Converting a string of format of 'dd/MM/yyyy' in Spark sql >>> >>> Actually, you should run sql("select paymentdate, >>> unix_timestamp(paymentdate, "dd/MM/yyyy") from tmp").first >>> >>> >>> But keep in mind you will get a unix timestamp! >>> >>> >>> On 24 March 2016 at 17:29, Mich Talebzadeh <mich.talebza...@gmail.com> >>> wrote: >>> >>>> Thanks guys. >>>> >>>> Unfortunately neither is working >>>> >>>> sql("select paymentdate, unix_timestamp(paymentdate) from tmp").first >>>> res28: org.apache.spark.sql.Row = [10/02/2014,null] >>>> >>>> >>>> Dr Mich Talebzadeh >>>> >>>> >>>> >>>> LinkedIn * >>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>> >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> >>>> On 24 March 2016 at 14:23, Ajay Chander <itsche...@gmail.com> wrote: >>>> >>>>> Mich, >>>>> >>>>> Can you try the value for paymentdata to this >>>>> format paymentdata='2015-01-01 23:59:59' , to_date(paymentdate) and >>>>> see if it helps. >>>>> >>>>> >>>>> On Thursday, March 24, 2016, Tamas Szuromi < >>>>> tamas.szur...@odigeo.com.invalid> wrote: >>>>> >>>>>> Hi Mich, >>>>>> >>>>>> Take a look >>>>>> https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/sql/functions.html#unix_timestamp(org.apache.spark.sql.Column,%20java.lang.String) >>>>>> >>>>>> cheers, >>>>>> Tamas >>>>>> >>>>>> >>>>>> On 24 March 2016 at 14:29, Mich Talebzadeh <mich.talebza...@gmail.com >>>>>> > wrote: >>>>>> >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I am trying to convert a date in Spark temporary table >>>>>>> >>>>>>> Tried few approaches. >>>>>>> >>>>>>> scala> sql("select paymentdate, to_date(paymentdate) from tmp") >>>>>>> res21: org.apache.spark.sql.DataFrame = [paymentdate: string, _c1: >>>>>>> date] >>>>>>> >>>>>>> >>>>>>> scala> sql("select paymentdate, to_date(paymentdate) from tmp").first >>>>>>> *res22: org.apache.spark.sql.Row = [10/02/2014,null]* >>>>>>> >>>>>>> My date is stored as String dd/MM/yyyy as shown above. However, >>>>>>> to_date() returns null! >>>>>>> >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> >>>>>>> Dr Mich Talebzadeh >>>>>>> >>>>>>> >>>>>>> >>>>>>> LinkedIn * >>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>> >>>>>>> >>>>>>> >>>>>>> http://talebzadehmich.wordpress.com >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>> >>> >> >