Hello, I am facing an issue with partitionBy, it is not clear whether it is a problem with my code or with my spark setup. I am using Spark 1.1, standalone, and my other spark projects work fine.
So I have to repartition a relatively large file (about 70 million lines). Here is a minimal version of what is not working fine: myRDD = sc.textFile("...").map { line => (extractKey(line),line) } myRepartitionedRDD = myRDD.partitionBy(new HashPartitioner(100)) myRepartitionedRDD.saveAsTextFile(...) It runs quite some time, until I get some errors and it retries. Errors are: FetchFailed(BlockManagerId(3,myWorker2, 52082,0), shuffleId=1,mapId=1,reduceId=5) Logs are not much more infomrative. I get: Java.io.IOException : sendMessageReliability failed because ack was not received within 60 sec I get similar errors with all my workers. Do you have some kind of explanation for this behaviour? What could be wrong? Thanks,