Can you get the logs from the individual map tasks? It seems like there are some map task failures and then hadoop by default will try to run each task 3 times before bailing out (which would explain the jump from 100% back to 75%).
Jarcec > On Apr 9, 2016, at 2:33 PM, Tobias Feldhaus <[email protected]> wrote: > > Hi, > > I'm running Apache Sqoop (v 1.4.6 from the Cloudera distribution), I > have a cluster on AWS with 1 Master, 1 Name, and 5 worker nodes > (m4.4xlarge) and I'm trying to import a MySQL (v 5.6) table via the > following command: > > sqoop import -direct --table ads --connect jdbc:mysql://10.0.0.125:8500/db > --password XXX --username XXX > > The command executes but the job never completes, for the past 24 hours > the output is: > > (...) > 16/04/09 10:52:54 INFO mapreduce.Job: map 100% reduce 0% > 16/04/09 13:06:44 INFO mapreduce.Job: map 75% reduce 0% > 16/04/09 13:06:54 INFO mapreduce.Job: map 100% reduce 0% > 16/04/09 15:24:43 INFO mapreduce.Job: map 75% reduce 0% > ([always prints alternating 100%/75% lines hereafter]) > > The nodes a are almost idle, disks are not full and network is at > about 3-5 MByte/s (in/out). > > Question: Is this normal behavior or do I have a deadlock here? What > would be the next step for investigation? Table size is estimated at > around 60 GiB, load average, disk usage and physical memory > utilization is low and can be found here (http://puu.sh/ocfDq/f4c3592530.png). > The full log can be found here > (https://gist.github.com/james-woods/b0745c96e0ef31e954d038de256a5b83). > > Thanks for any advice, > Tobi
