Re: Use Arrow instead of Pickle without pandas_udf

Hichame El Khalfi Wed, 25 Jul 2018 15:41:02 -0700

Hey Holden,
Thanks for your reply,

We currently using a python function that produces a Row(TS=LongType(), 
bin=BinaryType()).
We use this function like this 
dataframe.rdd.map(my_function).toDF().write.parquet()


To reuse it in pandas_udf, we changes the return type to 
StructType(StructField(Long), StructField(BinaryType).

1)But we face an issue that StructType is not supported by pandas_udf.

So I was wondering to still continue to reuse dataftame.rdd.map but get an 
improvement in serialization by using ArrowFormat instead of Pickle.

From: hol...@pigscanfly.ca
Sent: July 25, 2018 4:41 PM
To: hich...@elkhalfi.com
Cc: user@spark.apache.org
Subject: Re: Use Arrow instead of Pickle without pandas_udf


Not currently. What's the problem with pandas_udf for your use case?

On Wed, Jul 25, 2018 at 1:27 PM, Hichame El Khalfi 
<hich...@elkhalfi.com<mailto:hich...@elkhalfi.com>> wrote:

Hi There,


Is there a way to use Arrow format instead of Pickle but without using 
pandas_udf ?


Thank for your help,


Hichame



--
Twitter: https://twitter.com/holdenkarau

Re: Use Arrow instead of Pickle without pandas_udf

Reply via email to