Hello,

I am facing an issue with partitionBy, it is not clear whether it is a
problem with my code or with my spark setup. I am using Spark 1.1,
standalone, and my other spark projects work fine.

So I have to repartition a relatively large file (about 70 million lines).
Here is a minimal version of what is not working fine:

myRDD = sc.textFile("...").map { line => (extractKey(line),line) }
myRepartitionedRDD = myRDD.partitionBy(new HashPartitioner(100))
myRepartitionedRDD.saveAsTextFile(...)

It runs quite some time, until I get some errors and it retries. Errors are:

FetchFailed(BlockManagerId(3,myWorker2, 52082,0),
shuffleId=1,mapId=1,reduceId=5)

Logs are not much more infomrative. I get:

Java.io.IOException : sendMessageReliability failed because ack was not
received within 60 sec

I get similar errors with all my workers.

Do you have some kind of explanation for this behaviour? What could be
wrong?

Thanks,

Reply via email to