Hi,
I am seeing different shuffle write sizes when using SchemaRDD (versus
normal RDD). I'm doing the following:
case class DomainObj(a: String, b: String, c: String, d: String)
val logs: RDD[String] = sc.textFile(...)
val filtered: RDD[String] = logs.filter(...)
val myDomainObjects:
Spark SQL always uses a custom configuration of Kryo under the hood to
improve shuffle performance:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlSerializer.scala
Michael
On Sun, Sep 21, 2014 at 9:04 AM, Grega Kešpret gr...@celtra.com