I'm looking at the Tuning Guide suggestion to use Kryo instead of default
serialization. My questions:
Does pyspark use Java serialization by default, as Scala spark does? If
so, then...
can I use Kryo with pyspark instead? The instructions say I should
register my classes with the Kryo
Kryo won’t make a major impact on PySpark because it just stores data as byte[]
objects, which are fast to serialize even with Java. But it may be worth a try
— you would just set spark.serializer and not try to register any classes. What
might make more impact is storing data
Hi Patrick,
Thanks for your reply.
I am guessing even an array type will be registered automatically. Is this
correct?
Thanks,
Pradeep
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-serialization-does-not-compress-tp2042p2400.html
Sent from
We are trying to use kryo serialization, but with kryo serialization ON the
memory consumption does not change. We have tried this on multiple sets of
data.
We have also checked the logs of Kryo serialization and have confirmed that
Kryo is being used.
Can somebody please help us
201 - 204 of 204 matches
Mail list logo