Protobuf involvement makes me more suspicious that this is possibly a
corruption or an issue with serialization as well. Perhaps if you can
share some stack traces, people can help better. If it is reliably
reproducible, then I'd also check for the count of records until after
this occurs, and see
There is no trace to check on the task, I get n/a instead of the links to the
traces in the web.
Some of the maps are retried successfully while others fail again until one of
them fails four times in a row and the job is automatically terminated. Is this
compatible with protobuf corruption? In
Harsh,
Could IsolationRunner be used here. I'd put up a patch for HADOOP-8765,
after applying which IsolationRunner works for me. Maybe we could use it to
re-run the map task that's failing and debug.
Thanks
hemanth
On Thu, Sep 6, 2012 at 9:42 PM, Harsh J ha...@cloudera.com wrote:
Protobuf