It's my fault! I upload a wrong jar when I changed the number of partitions. and Now it just works fine:)
The size of word_mapping is 2444185. So it will take very long time for large object serialization? I don't think two million is very large, because the cost at local for such size is typically less than one second. Thanks for the help:) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/parallelize-for-a-large-Seq-is-extreamly-slow-tp4801p4914.html Sent from the Apache Spark User List mailing list archive at Nabble.com.