That protocol is buggy. i have similar problem with httpclient. let switch
to protocol-http


On Thu, Dec 5, 2013 at 6:53 PM, Amit Sela <[email protected]> wrote:

> In the plugin.includes ? protocol-httpclient
>
>
> On Thu, Dec 5, 2013 at 12:08 PM, Nguyen Manh Tien <
> [email protected]> wrote:
>
>> Which protocol are you using Amit?
>>
>>
>> On Wed, Dec 4, 2013 at 10:46 PM, Amit Sela <[email protected]> wrote:
>>
>> > In my case, the fetch got to that point in 45 minutes and is stuck
>> another
>> > 75 minutes with those mappers.
>> > The log just keeps printing:
>> >
>> > org.apache.nutch.fetcher.Fetcher: -activeThreads=2, spinWaiting=0,
>> > fetchQueues.totalSize=0
>> >
>> > org.apache.nutch.fetcher.Fetcher: -activeThreads=2, spinWaiting=0,
>> > fetchQueues.totalSize=0
>> >
>> > org.apache.nutch.fetcher.Fetcher: -activeThreads=2, spinWaiting=0,
>> > fetchQueues.totalSize=0
>> >
>> > ....
>> >
>> >
>> >
>> > On Wed, Dec 4, 2013 at 4:31 PM, feng lu <[email protected]> wrote:
>> >
>> > > I see that it use a while loop to wait for threads to exit and will
>> wait
>> > 1
>> > > second between each check. so even if fetcher thread was finished, the
>> > > whole fetcher process will take little longer to exit.
>> > >
>> > > code structure like this.
>> > >
>> > >  do {                                          // wait for threads to
>> > exit
>> > >       pagesLastSec = pages.get();
>> > >       bytesLastSec = (int)bytes.get();
>> > >
>> > >       try {
>> > >         Thread.sleep(1000);
>> > >       } catch (InterruptedException e) {}
>> > >
>> > >       ....
>> > >       reportStatus(pagesLastSec, bytesLastSec);   // your print
>> output is
>> > > coming here
>> > >
>> > >       LOG.info("-activeThreads=" + activeThreads + ", spinWaiting=" +
>> > > spinWaiting.get()
>> > >           + ", fetchQueues.totalSize=" + fetchQueues.getTotalSize());
>> > >
>> > >       if (!feeder.isAlive() && fetchQueues.getTotalSize() < 5) {
>> > >         fetchQueues.dump();
>> > >       }
>> > >       ....
>> > >       // check timelimit
>> > >       if (!feeder.isAlive()) {
>> > >         int hitByTimeLimit = fetchQueues.checkTimelimit();
>> > >         if (hitByTimeLimit != 0) reporter.incrCounter("FetcherStatus",
>> > >             "hitByTimeLimit", hitByTimeLimit);
>> > >       }
>> > >
>> > >       // some requests seem to hang, despite all intentions
>> > >       if ((System.currentTimeMillis() - lastRequestStart.get()) >
>> > timeout)
>> > > {
>> > >         if (LOG.isWarnEnabled()) {
>> > >           LOG.warn("Aborting with "+activeThreads+" hung threads.");
>> > >         }
>> > >         return;
>> > >       }
>> > >
>> > >     } while (activeThreads.get() > 0);
>> > >
>> > >
>> > > On Wed, Dec 4, 2013 at 7:57 PM, Amit Sela <[email protected]>
>> wrote:
>> > >
>> > > > In the fetch phase, I notice that some of the mappers take much
>> longer
>> > to
>> > > > finish.
>> > > > In the running task mapreduce admin screen it shows
>> > > >
>> > > > *1 threads, 1 queues, 0 URLs queued, *
>> > > >
>> > > > So why those tasks are not complete ?
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Don't Grow Old, Grow Up... :-)
>> > >
>> >
>>
>
>

Reply via email to