its a config specific to Hadoop. But when added to nutch-site, Hadoop will
pick it up just fine.

Op di 7 jan 2025 om 15:01 schreef Raj Chidara <[email protected]>:

> In conf, I did not find the property mapreduce.task.timeout.
>
> Thanks and Regards
> Raj Chidara
>
>
>
> ---- On Tue, 07 Jan 2025 18:20:26 +0530 *Markus Jelsma
> <[email protected] <[email protected]>>* wrote ---
>
> Im not sure if you are allowed to set -D options via bin/crawl like that.
> You can add it to commands such as bin/nutch fetch -D blabla.
>
> Try setting it in the conf itself.
>
> Op di 7 jan 2025 om 14:44 schreef Raj Chidara <[email protected]>:
>
> Hi Markus
>   I increased timeout to 2 minutes.  However problem is same.  This is
> command used.  Please correct me if I am doing anything wrong in syntax
>
> bin/crawl -s urls -D mapreduce.task.timeout=120000  --size-fetchlist 100
> --num-tasks 1 crawl 3
>
> Thanks and Regards
> Raj Chidara
>
>
>
> ---- On Tue, 07 Jan 2025 17:38:28 +0530 *Markus Jelsma
> <[email protected] <[email protected]>>* wrote ---
>
> Aah, 10000 is too short as it is in milliseconds. Fetchers will probably
> also hang on sites without issues. Try setting it to two minutes instead.
>
>
>
> Op do 2 jan 2025 om 17:12 schreef Raj Chidara <[email protected]>:
>
> > Hi Markus
> > Thanks for the response. I did not find any GC issues. I also increased
> > mapred.task.timeout to 10000. Still I have same issue.
> >
> > 2025-01-02 14:36:53,481 INFO o.a.n.f.Fetcher [LocalJobRunner Map Task
> > Executor #0] -activeThreads=1, spinWaiting=1, fetchQueues.totalSize=1,
> > fetchQueues.getQueueCount=1
> > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueues [LocalJobRunner Map
> > Task Executor #0] * queue: www.titck.gov.tr
> > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map
> > Task Executor #0] maxThreads = 1
> > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map
> > Task Executor #0] inProgress = 1
> > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map
> > Task Executor #0] crawlDelay = 5000
> > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map
> > Task Executor #0] minCrawlDelay = 0
> > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map
> > Task Executor #0] nextFetchTime = 1735828612457
> > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map
> > Task Executor #0] now = 1735828613481
> > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map
> > Task Executor #0] 0. https://www.titck.gov.tr/
> > 2025-01-02 14:36:53,481 WARN o.a.n.f.Fetcher [LocalJobRunner Map Task
> > Executor #0] Aborting with 1 hung threads.
> > 2025-01-02 14:36:53,481 WARN o.a.n.f.Fetcher [LocalJobRunner Map Task
> > Executor #0] Thread #0 hung while processing https://www.titck.gov.tr/
> > 2025-01-02 14:36:53,536 WARN o.a.h.m.i.MetricsSystemImpl
> > [pool-55-thread-1] JobTracker metrics system already initialized!
> > 2025-01-02 14:36:54,389 INFO o.a.h.m.Job [main] map 100% reduce 100%
> > 2025-01-02 14:36:54,389 INFO o.a.h.m.Job [main] Job
> > job_local1014979377_0001 completed successfully
> > 2025-01-02 14:36:54,397 INFO o.a.h.m.Job [main] Counters: 31
> > File System Counters
> > FILE: Number of bytes read=1717876
> > FILE: Number of bytes written=3144478
> > FILE: Number of read operations=0
> > FILE: Number of large read operations=0
> > FILE: Number of write operations=0
> > Map-Reduce Framework
> > Map input records=1
> > Map output records=0
> > Map output bytes=0
> > Map output materialized bytes=14
> > Input split bytes=162
> > Combine input records=0
> > Combine output records=0
> > Reduce input groups=0
> > Reduce shuffle bytes=14
> > Reduce input records=0
> > Reduce output records=0
> > Spilled Records=0
> > Shuffled Maps =1
> > Failed Shuffles=0
> > Merged Map outputs=1
> > GC time elapsed (ms)=0
> > Total committed heap usage (bytes)=4299161600
> > FetcherStatus
> > bytes_downloaded=0
> > Shuffle Errors
> > BAD_ID=0
> > CONNECTION=0
> > IO_ERROR=0
> > WRONG_LENGTH=0
> > WRONG_MAP=0
> > WRONG_REDUCE=0
> > File Input Format Counters
> > Bytes Read=182
> > File Output Format Counters
> > Bytes Written=564
> > 2025-01-02 14:36:54,397 INFO o.a.n.f.Fetcher [main] Fetcher: finished at
> > 2025-01-02 14:36:54, elapsed: 00:05:03
> >
> >
> > Thanks and Regards
> > Raj Chidara
> >
> >
> >
> > ---- On Thu, 02 Jan 2025 18:10:05 +0530 *Markus Jelsma
> > <[email protected] <[email protected]>>* wrote ---
> >
> > Hi Raj,
> >
> > I can't seem to find an issue crawling that site, but maybe your parser
> is
> > hanging. It is usually the case when 'hanging' threads are detected. You
> > can also increase -Dmapred.task.timeout=, it controls how long it waits
> > before giving up on hanging threads.
> >
> > Also check your logs, there can be a hint in there, such as a GC issue,
> or
> > whatever.
> >
> > Regards,
> > Markus
> >
> > Op wo 1 jan 2025 om 15:26 schreef Raj Chidara <[email protected]>:
>
> >
> > > Hi
> > >
> > > I have problem in crawling and fetching this site
> > > https://www.titck.gov.tr/ . It is either crawling same page again and
> > > again and some times I get an error that Thread #0 hung while
> processing
> > > https://www.titck.gov.tr/. Can you please help me out.
> > >
> > > Thanks and Regards
> > >
> > > Raj Chidara
> > >
> > >
> > >
> > >
> > > Global Locations:
> > >
> > > USA | UK | India | Singapore | Japan
> > >
> > > *ISO 9001, 27001, 13485 Compliant
> > >
> > > www.DDIsmart.com
> > >
> > > About Us | Awards | Blog | News | Contact Us
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > DISCLAIMER: This message is intended solely for the use of the
> > individual
> > > or entity to which it is addressed. If you are not the intended
> > recipient,
> > > you should not use, copy, alter, or disclose the contents of this
> > message.
> > > All information or opinions expressed in this message and/or any
> > > attachments are those of the author and are not necessarily those of
> the
> > > group companies.
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> > [image: DDi logo] <https://www.ddismart.com>
> >
> > *Global Locations:*
> > USA | UK | India | Singapore | Japan
> > *ISO 9001, 27001, 13485 Compliant
> > www.DDIsmart.com <https://www.ddismart.com>
> > About Us <https://www.ddismart.com/ddi-drug-development-informatics/> |
> > Awards <https://www.ddismart.com/ddi-awards-recognition/> | Blog
> > <https://www.ddismart.com/ddi-blog/> | News
> > <https://www.ddismart.com/news-section/> | Contact Us
> > <https://www.ddismart.com/contact-ddi/>
> >
> > [image: DDi wishes for the New Year 2025]
> >
> > DISCLAIMER: This message is intended solely for the use of the
> individual
> > or entity to which it is addressed. If you are not the intended
> recipient,
> > you should not use, copy, alter, or disclose the contents of this
> message.
> > All information or opinions expressed in this message and/or any
> > attachments are those of the author and are not necessarily those of the
> > group companies.
> >
> >
>
>
>
>
> [image: DDi logo] <https://www.ddismart.com>
>
> *Global Locations:*
> USA | UK | India | Singapore | Japan
> *ISO 9001, 27001, 13485 Compliant
> www.DDIsmart.com <https://www.ddismart.com>
> About Us <https://www.ddismart.com/ddi-drug-development-informatics/> |
> Awards <https://www.ddismart.com/ddi-awards-recognition/> | Blog
> <https://www.ddismart.com/ddi-blog/> | News
> <https://www.ddismart.com/news-section/> | Contact Us
> <https://www.ddismart.com/contact-ddi/>
>
> [image: DDi wishes for the New Year 2025]
>
> DISCLAIMER: This message is intended solely for the use of the individual
> or entity to which it is addressed. If you are not the intended recipient,
> you should not use, copy, alter, or disclose the contents of this message.
> All information or opinions expressed in this message and/or any
> attachments are those of the author and are not necessarily those of the
> group companies.
>
>
>
>
> [image: DDi logo] <https://www.ddismart.com>
>
> *Global Locations:*
> USA | UK | India | Singapore | Japan
> *ISO 9001, 27001, 13485 Compliant
> www.DDIsmart.com <https://www.ddismart.com>
> About Us <https://www.ddismart.com/ddi-drug-development-informatics/> |
> Awards <https://www.ddismart.com/ddi-awards-recognition/> | Blog
> <https://www.ddismart.com/ddi-blog/> | News
> <https://www.ddismart.com/news-section/> | Contact Us
> <https://www.ddismart.com/contact-ddi/>
>
> [image: DDi wishes for the New Year 2025]
>
> DISCLAIMER: This message is intended solely for the use of the individual
> or entity to which it is addressed. If you are not the intended recipient,
> you should not use, copy, alter, or disclose the contents of this message.
> All information or opinions expressed in this message and/or any
> attachments are those of the author and are not necessarily those of the
> group companies.
>
>

Reply via email to