Public bug reported: When I run terasort with Hadoop 2.7.1 on Ubuntu 1604, in 3 slaves and 1 master, with 500000000 records, in the middle of mapreduce job, some Ubuntu slave nodes will not able to be connected. In this case, we are not able to create ssh connection to those slave nodes (connection refused).
If we login the slave node, then we will find: 1. dmesg shows systemd-journald received SIGTERM; 2. Several errors are found from /var/log/syslog. Iscsid reports semop down failed 22. This is the Terasort output: 16/06/10 03:39:25 INFO terasort.TeraSort: starting 16/06/10 03:39:27 INFO input.FileInputFormat: Total input paths to process : 2 Spent 336ms computing base-splits. Spent 9ms computing TeraScheduler splits. Computing input splits took 348ms Sampling 10 splits of 38 Making 7 from 100000 sampled records Computing parititions took 1396ms Spent 1749ms computing partitions. 16/06/10 03:39:29 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.1.85:8032 16/06/10 03:39:30 INFO mapreduce.JobSubmitter: number of splits:38 16/06/10 03:39:30 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 16/06/10 03:39:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1465554943455_0002 16/06/10 03:39:30 INFO impl.YarnClientImpl: Submitted application application_1465554943455_0002 16/06/10 03:39:30 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1465554943455_0002/ 16/06/10 03:39:30 INFO mapreduce.Job: Running job: job_1465554943455_0002 16/06/10 03:40:03 INFO mapreduce.Job: Job job_1465554943455_0002 running in uber mode : false 16/06/10 03:40:03 INFO mapreduce.Job: map 0% reduce 0% 16/06/10 03:44:54 INFO mapreduce.Job: map 1% reduce 0% 16/06/10 03:45:12 INFO mapreduce.Job: map 2% reduce 0% .................. 16/05/25 00:35:53 INFO mapreduce.Job: map 69% reduce 0% 16/05/25 00:35:54 INFO mapreduce.Job: map 75% reduce 0% 16/05/25 00:35:56 INFO mapreduce.Job: map 88% reduce 0% 16/05/25 00:35:57 INFO ipc.Client: Retrying connect to server: ubuntubm10/192.168.1.85:38381. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) ** Affects: ubuntu Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1594534 Title: Terasort (hadoop 2.7.1) failed on Ubuntu 1604 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+bug/1594534/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs