Hello, I'm trying to optimize nutch performance for crawling sites. Now i test performance on small hadoop cluster, only two nodes 32gb RAM, cpu Intel Xeon E3 1245v2 4c/8t. My config for nutch http://pastebin.com/bBRHpFuq So, the problem: fetching jobs works not optimal. Some reduce task has 4k pages for fetching, some 1kk pages. For example see screenshot https://docs.google.com/file/d/0B98dgNxOqKMvT1doOVVPUU1PNXM/edit Some reduce task finished in 10 minutes, but one task work 11 hours and still continue working, so it's like a bottle neck when i have 24 reduce task, but works only one. May be someone can give usefull advices or links where i can read about problem.
Big thanks for help Sergey

