Hi,
my team is setting up a machine-learning framework based on Spark's mlib,
that currently uses
logistic regression. I enabled Kryo serialization and enforced class
registration, so I know
that all the serialized classes are registered. However, the running times
when Kryo
serialization is enabled are consistently longer. This is true both when
running locally on
a smaller samples (1.6 minutes vs 1.3m) and also when running with a larger
sample on AWS with
two workers nodes (2h30 vs 1h50).

Using the monitoring tools suggests that Task Deserialization Times are
similar (although perhaps
slightly longer for Kryo), but Task Durations and even Scheduler Delays
increase significantly.

There is also a significant difference in memory usage: for Kryo the number
of stored RDDs is
higher (much more so on the local sample: 40 vs. 4).

Does anyone have an idea of what can be going on, or where should I focus to
find out?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Enabling-kryo-serialization-slows-down-machine-learning-app-tp24947.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to