Hi,

I have written the UDF for doing same in pyspark DataFrame since some of my
dates are before unix standard time epoch of 1/1/1970. I have more than 250
columns and applying custom date_format UDF to more than 50 columns. I am
getting OOM error and poor performance because of UDF.

What's your Data Size and how is the performance?

Thanks,
Bijay

On Thu, Mar 24, 2016 at 10:19 AM, Mich Talebzadeh <mich.talebza...@gmail.com
> wrote:

> Minor correction UK date is dd/MM/yyyy
>
> scala> sql("select paymentdate,
> TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(paymentdate,'dd/MM/yyyy'),'yyyy-MM-dd'))
> AS newdate from tmp").first
> res47: org.apache.spark.sql.Row = [10/02/2014,2014-02-10]
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 24 March 2016 at 17:09, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Thanks everyone. Appreciated
>>
>> sql("select paymentdate,
>> TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(paymentdate,'MM/dd/yyyy'),'yyyy-MM-dd'))
>> from tmp").first
>> res45: org.apache.spark.sql.Row = [10/02/2014,2014-10-02]
>>
>> Breaking a nut with sledgehammer :)
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 24 March 2016 at 17:03, Kasinathan, Prabhu <pkasinat...@paypal.com>
>> wrote:
>>
>>> Can you try this one?
>>>
>>> spark-sql> select paymentdate,
>>> TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(paymentdate,'MM/dd/yyyy'),'yyyy-MM-dd'))
>>> from tmp;
>>> 10/02/2014 2014-10-02
>>> spark-sql>
>>>
>>>
>>> From: Tamas Szuromi <tamas.szur...@odigeo.com.INVALID>
>>> Date: Thursday, March 24, 2016 at 9:35 AM
>>> To: Mich Talebzadeh <mich.talebza...@gmail.com>
>>> Cc: Ajay Chander <itsche...@gmail.com>, Tamas Szuromi <
>>> tamas.szur...@odigeo.com.INVALID>, "user @spark" <user@spark.apache.org>
>>> Subject: Re: Converting a string of format of 'dd/MM/yyyy' in Spark sql
>>>
>>> Actually, you should run  sql("select paymentdate,
>>> unix_timestamp(paymentdate, "dd/MM/yyyy") from tmp").first
>>>
>>>
>>> But keep in mind you will get a unix timestamp!
>>>
>>>
>>> On 24 March 2016 at 17:29, Mich Talebzadeh <mich.talebza...@gmail.com>
>>> wrote:
>>>
>>>> Thanks guys.
>>>>
>>>> Unfortunately neither is working
>>>>
>>>>  sql("select paymentdate, unix_timestamp(paymentdate) from tmp").first
>>>> res28: org.apache.spark.sql.Row = [10/02/2014,null]
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 24 March 2016 at 14:23, Ajay Chander <itsche...@gmail.com> wrote:
>>>>
>>>>> Mich,
>>>>>
>>>>> Can you try the value for paymentdata to this
>>>>> format  paymentdata='2015-01-01 23:59:59' , to_date(paymentdate) and
>>>>> see if it helps.
>>>>>
>>>>>
>>>>> On Thursday, March 24, 2016, Tamas Szuromi <
>>>>> tamas.szur...@odigeo.com.invalid> wrote:
>>>>>
>>>>>> Hi Mich,
>>>>>>
>>>>>> Take a look
>>>>>> https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/sql/functions.html#unix_timestamp(org.apache.spark.sql.Column,%20java.lang.String)
>>>>>>
>>>>>> cheers,
>>>>>> Tamas
>>>>>>
>>>>>>
>>>>>> On 24 March 2016 at 14:29, Mich Talebzadeh <mich.talebza...@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am trying to convert a date in Spark temporary table
>>>>>>>
>>>>>>> Tried few approaches.
>>>>>>>
>>>>>>> scala> sql("select paymentdate, to_date(paymentdate) from tmp")
>>>>>>> res21: org.apache.spark.sql.DataFrame = [paymentdate: string, _c1:
>>>>>>> date]
>>>>>>>
>>>>>>>
>>>>>>> scala> sql("select paymentdate, to_date(paymentdate) from tmp").first
>>>>>>> *res22: org.apache.spark.sql.Row = [10/02/2014,null]*
>>>>>>>
>>>>>>> My date is stored as String dd/MM/yyyy as shown above. However,
>>>>>>> to_date() returns null!
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>> Dr Mich Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> LinkedIn * 
>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>

Reply via email to