Hello,
I am facing an issue with partitionBy, it is not clear whether it is a
problem with my code or with my spark setup. I am using Spark 1.1,
standalone, and my other spark projects work fine.
So I have to repartition a relatively large file (about 70 million lines).
Here is a minimal version of what is not working fine:
myRDD = sc.textFile("...").map { line => (extractKey(line),line) }
myRepartitionedRDD = myRDD.partitionBy(new HashPartitioner(100))
myRepartitionedRDD.saveAsTextFile(...)
It runs quite some time, until I get some errors and it retries. Errors are:
FetchFailed(BlockManagerId(3,myWorker2, 52082,0),
shuffleId=1,mapId=1,reduceId=5)
Logs are not much more infomrative. I get:
Java.io.IOException : sendMessageReliability failed because ack was not
received within 60 sec
I get similar errors with all my workers.
Do you have some kind of explanation for this behaviour? What could be
wrong?
Thanks,