We have a not too complex and not too large spark job that keeps dying with this error
I have researched it and I have not seen any convincing explanation on why I am not using a shuffle service. Which server is the one that is refusing the connection? If I go to the server that is being reported in the error message, I see a lot of these errors towards the end: java.io.FileNotFoundException: D:\data\yarnnm\local\usercache\hadoop\appcache\application_1500970459432_1024\blockmgr-7f3a1abc-2b8b-4e51-9072-8c12495ec563\0e\shuffle_0_4107_0.index (may or may not be related to the problem at all) and if you examine further on this machine there are fetchfailedexceptions resulting from other machines and so on and so forth This is Spark 1.6 on Yarn-master Could anyone provide some insight or solution to this? thanks
