That protocol is buggy. i have similar problem with httpclient. let switch to protocol-http
On Thu, Dec 5, 2013 at 6:53 PM, Amit Sela <[email protected]> wrote: > In the plugin.includes ? protocol-httpclient > > > On Thu, Dec 5, 2013 at 12:08 PM, Nguyen Manh Tien < > [email protected]> wrote: > >> Which protocol are you using Amit? >> >> >> On Wed, Dec 4, 2013 at 10:46 PM, Amit Sela <[email protected]> wrote: >> >> > In my case, the fetch got to that point in 45 minutes and is stuck >> another >> > 75 minutes with those mappers. >> > The log just keeps printing: >> > >> > org.apache.nutch.fetcher.Fetcher: -activeThreads=2, spinWaiting=0, >> > fetchQueues.totalSize=0 >> > >> > org.apache.nutch.fetcher.Fetcher: -activeThreads=2, spinWaiting=0, >> > fetchQueues.totalSize=0 >> > >> > org.apache.nutch.fetcher.Fetcher: -activeThreads=2, spinWaiting=0, >> > fetchQueues.totalSize=0 >> > >> > .... >> > >> > >> > >> > On Wed, Dec 4, 2013 at 4:31 PM, feng lu <[email protected]> wrote: >> > >> > > I see that it use a while loop to wait for threads to exit and will >> wait >> > 1 >> > > second between each check. so even if fetcher thread was finished, the >> > > whole fetcher process will take little longer to exit. >> > > >> > > code structure like this. >> > > >> > > do { // wait for threads to >> > exit >> > > pagesLastSec = pages.get(); >> > > bytesLastSec = (int)bytes.get(); >> > > >> > > try { >> > > Thread.sleep(1000); >> > > } catch (InterruptedException e) {} >> > > >> > > .... >> > > reportStatus(pagesLastSec, bytesLastSec); // your print >> output is >> > > coming here >> > > >> > > LOG.info("-activeThreads=" + activeThreads + ", spinWaiting=" + >> > > spinWaiting.get() >> > > + ", fetchQueues.totalSize=" + fetchQueues.getTotalSize()); >> > > >> > > if (!feeder.isAlive() && fetchQueues.getTotalSize() < 5) { >> > > fetchQueues.dump(); >> > > } >> > > .... >> > > // check timelimit >> > > if (!feeder.isAlive()) { >> > > int hitByTimeLimit = fetchQueues.checkTimelimit(); >> > > if (hitByTimeLimit != 0) reporter.incrCounter("FetcherStatus", >> > > "hitByTimeLimit", hitByTimeLimit); >> > > } >> > > >> > > // some requests seem to hang, despite all intentions >> > > if ((System.currentTimeMillis() - lastRequestStart.get()) > >> > timeout) >> > > { >> > > if (LOG.isWarnEnabled()) { >> > > LOG.warn("Aborting with "+activeThreads+" hung threads."); >> > > } >> > > return; >> > > } >> > > >> > > } while (activeThreads.get() > 0); >> > > >> > > >> > > On Wed, Dec 4, 2013 at 7:57 PM, Amit Sela <[email protected]> >> wrote: >> > > >> > > > In the fetch phase, I notice that some of the mappers take much >> longer >> > to >> > > > finish. >> > > > In the running task mapreduce admin screen it shows >> > > > >> > > > *1 threads, 1 queues, 0 URLs queued, * >> > > > >> > > > So why those tasks are not complete ? >> > > > >> > > >> > > >> > > >> > > -- >> > > Don't Grow Old, Grow Up... :-) >> > > >> > >> > >

