I am run logistic regression with SGD on a problem with about 19M parameters (the kdda dataset from the libsvm library)
I consistently see that the nodes on my computer get disconnected and soon the whole job goes to a grinding halt. 14/07/12 03:05:16 ERROR cluster.YarnClientClusterScheduler: Lost executor 2 on pachy4 remote Akka client disassociated Does this have anything to do with the akka.frame_size? I have tried upto 1024 MB and I still get the same thing. I don't have any more information in the logs about why the clients are getting disconnected. Any thoughts? Regards, Krishna