I have added that to nutch-site.  However, getting same issue of hung thread.



Thanks and Regards

Raj Chidara








---- On Tue, 07 Jan 2025 18:36:55 +0530 Markus Jelsma 
<[email protected]> wrote ---



its a config specific to Hadoop. But when added to nutch-site, Hadoop will 
pick it up just fine. 
 
Op di 7 jan 2025 om 15:01 schreef Raj Chidara 
<mailto:[email protected]>: 
 
> In conf, I did not find the property mapreduce.task.timeout. 
> 
> Thanks and Regards 
> Raj Chidara 
> 
> 
> 
> ---- On Tue, 07 Jan 2025 18:20:26 +0530 *Markus Jelsma 
> <mailto:[email protected] <mailto:[email protected]>>* 
> wrote --- 
> 
> Im not sure if you are allowed to set -D options via bin/crawl like that. 
> You can add it to commands such as bin/nutch fetch -D blabla. 
> 
> Try setting it in the conf itself. 
> 
> Op di 7 jan 2025 om 14:44 schreef Raj Chidara 
> <mailto:[email protected]>: 
> 
> Hi Markus 
>   I increased timeout to 2 minutes.  However problem is same.  This is 
> command used.  Please correct me if I am doing anything wrong in syntax 
> 
> bin/crawl -s urls -D mapreduce.task.timeout=120000  --size-fetchlist 100 
> --num-tasks 1 crawl 3 
> 
> Thanks and Regards 
> Raj Chidara 
> 
> 
> 
> ---- On Tue, 07 Jan 2025 17:38:28 +0530 *Markus Jelsma 
> <mailto:[email protected] <mailto:[email protected]>>* 
> wrote --- 
> 
> Aah, 10000 is too short as it is in milliseconds. Fetchers will probably 
> also hang on sites without issues. Try setting it to two minutes instead. 
> 
> 
> 
> Op do 2 jan 2025 om 17:12 schreef Raj Chidara 
> <mailto:[email protected]>: 
> 
> > Hi Markus 
> > Thanks for the response. I did not find any GC issues. I also increased 
> > mapred.task.timeout to 10000. Still I have same issue. 
> > 
> > 2025-01-02 14:36:53,481 INFO o.a.n.f.Fetcher [LocalJobRunner Map Task 
> > Executor #0] -activeThreads=1, spinWaiting=1, fetchQueues.totalSize=1, 
> > fetchQueues.getQueueCount=1 
> > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueues [LocalJobRunner Map 
> > Task Executor #0] * queue: www.titck.gov.tr 
> > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map 
> > Task Executor #0] maxThreads = 1 
> > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map 
> > Task Executor #0] inProgress = 1 
> > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map 
> > Task Executor #0] crawlDelay = 5000 
> > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map 
> > Task Executor #0] minCrawlDelay = 0 
> > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map 
> > Task Executor #0] nextFetchTime = 1735828612457 
> > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map 
> > Task Executor #0] now = 1735828613481 
> > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map 
> > Task Executor #0] 0. https://www.titck.gov.tr/ 
> > 2025-01-02 14:36:53,481 WARN o.a.n.f.Fetcher [LocalJobRunner Map Task 
> > Executor #0] Aborting with 1 hung threads. 
> > 2025-01-02 14:36:53,481 WARN o.a.n.f.Fetcher [LocalJobRunner Map Task 
> > Executor #0] Thread #0 hung while processing https://www.titck.gov.tr/ 
> > 2025-01-02 14:36:53,536 WARN o.a.h.m.i.MetricsSystemImpl 
> > [pool-55-thread-1] JobTracker metrics system already initialized! 
> > 2025-01-02 14:36:54,389 INFO o.a.h.m.Job [main] map 100% reduce 100% 
> > 2025-01-02 14:36:54,389 INFO o.a.h.m.Job [main] Job 
> > job_local1014979377_0001 completed successfully 
> > 2025-01-02 14:36:54,397 INFO o.a.h.m.Job [main] Counters: 31 
> > File System Counters 
> > FILE: Number of bytes read=1717876 
> > FILE: Number of bytes written=3144478 
> > FILE: Number of read operations=0 
> > FILE: Number of large read operations=0 
> > FILE: Number of write operations=0 
> > Map-Reduce Framework 
> > Map input records=1 
> > Map output records=0 
> > Map output bytes=0 
> > Map output materialized bytes=14 
> > Input split bytes=162 
> > Combine input records=0 
> > Combine output records=0 
> > Reduce input groups=0 
> > Reduce shuffle bytes=14 
> > Reduce input records=0 
> > Reduce output records=0 
> > Spilled Records=0 
> > Shuffled Maps =1 
> > Failed Shuffles=0 
> > Merged Map outputs=1 
> > GC time elapsed (ms)=0 
> > Total committed heap usage (bytes)=4299161600 
> > FetcherStatus 
> > bytes_downloaded=0 
> > Shuffle Errors 
> > BAD_ID=0 
> > CONNECTION=0 
> > IO_ERROR=0 
> > WRONG_LENGTH=0 
> > WRONG_MAP=0 
> > WRONG_REDUCE=0 
> > File Input Format Counters 
> > Bytes Read=182 
> > File Output Format Counters 
> > Bytes Written=564 
> > 2025-01-02 14:36:54,397 INFO o.a.n.f.Fetcher [main] Fetcher: finished at 
> > 2025-01-02 14:36:54, elapsed: 00:05:03 
> > 
> > 
> > Thanks and Regards 
> > Raj Chidara 
> > 
> > 
> > 
> > ---- On Thu, 02 Jan 2025 18:10:05 +0530 *Markus Jelsma 
> > <mailto:[email protected] <mailto:[email protected]>>* 
> > wrote --- 
> > 
> > Hi Raj, 
> > 
> > I can't seem to find an issue crawling that site, but maybe your parser 
> is 
> > hanging. It is usually the case when 'hanging' threads are detected. You 
> > can also increase -Dmapred.task.timeout=, it controls how long it waits 
> > before giving up on hanging threads. 
> > 
> > Also check your logs, there can be a hint in there, such as a GC issue, 
> or 
> > whatever. 
> > 
> > Regards, 
> > Markus 
> > 
> > Op wo 1 jan 2025 om 15:26 schreef Raj Chidara 
> > <mailto:[email protected]>: 
> 
> > 
> > > Hi 
> > > 
> > > I have problem in crawling and fetching this site 
> > > https://www.titck.gov.tr/ . It is either crawling same page again and 
> > > again and some times I get an error that Thread #0 hung while 
> processing 
> > > https://www.titck.gov.tr/. Can you please help me out. 
> > > 
> > > Thanks and Regards 
> > > 
> > > Raj Chidara 
> > > 
> > > 
> > > 
> > > 
> > > Global Locations: 
> > > 
> > > USA | UK | India | Singapore | Japan 
> > > 
> > > *ISO 9001, 27001, 13485 Compliant 
> > > 
> > > www.DDIsmart.com 
> > > 
> > > About Us | Awards | Blog | News | Contact Us 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > DISCLAIMER: This message is intended solely for the use of the 
> > individual 
> > > or entity to which it is addressed. If you are not the intended 
> > recipient, 
> > > you should not use, copy, alter, or disclose the contents of this 
> > message. 
> > > All information or opinions expressed in this message and/or any 
> > > attachments are those of the author and are not necessarily those of 
> the 
> > > group companies. 
> > > 
> > > 
> > > 
> > > 
> > > 
> > 
> > 
> > 
> > 
> > [image: DDi logo] <https://www.ddismart.com> 
> > 
> > *Global Locations:* 
> > USA | UK | India | Singapore | Japan 
> > *ISO 9001, 27001, 13485 Compliant 
> > www.DDIsmart.com <https://www.ddismart.com> 
> > About Us <https://www.ddismart.com/ddi-drug-development-informatics/> | 
> > Awards <https://www.ddismart.com/ddi-awards-recognition/> | Blog 
> > <https://www.ddismart.com/ddi-blog/> | News 
> > <https://www.ddismart.com/news-section/> | Contact Us 
> > <https://www.ddismart.com/contact-ddi/> 
> > 
> > [image: DDi wishes for the New Year 2025] 
> > 
> > DISCLAIMER: This message is intended solely for the use of the 
> individual 
> > or entity to which it is addressed. If you are not the intended 
> recipient, 
> > you should not use, copy, alter, or disclose the contents of this 
> message. 
> > All information or opinions expressed in this message and/or any 
> > attachments are those of the author and are not necessarily those of the 
> > group companies. 
> > 
> > 
> 
> 
> 
> 
> [image: DDi logo] <https://www.ddismart.com> 
> 
> *Global Locations:* 
> USA | UK | India | Singapore | Japan 
> *ISO 9001, 27001, 13485 Compliant 
> www.DDIsmart.com <https://www.ddismart.com> 
> About Us <https://www.ddismart.com/ddi-drug-development-informatics/> | 
> Awards <https://www.ddismart.com/ddi-awards-recognition/> | Blog 
> <https://www.ddismart.com/ddi-blog/> | News 
> <https://www.ddismart.com/news-section/> | Contact Us 
> <https://www.ddismart.com/contact-ddi/> 
> 
> [image: DDi wishes for the New Year 2025] 
> 
> DISCLAIMER: This message is intended solely for the use of the individual 
> or entity to which it is addressed. If you are not the intended recipient, 
> you should not use, copy, alter, or disclose the contents of this message. 
> All information or opinions expressed in this message and/or any 
> attachments are those of the author and are not necessarily those of the 
> group companies. 
> 
> 
> 
> 
> [image: DDi logo] <https://www.ddismart.com> 
> 
> *Global Locations:* 
> USA | UK | India | Singapore | Japan 
> *ISO 9001, 27001, 13485 Compliant 
> www.DDIsmart.com <https://www.ddismart.com> 
> About Us <https://www.ddismart.com/ddi-drug-development-informatics/> | 
> Awards <https://www.ddismart.com/ddi-awards-recognition/> | Blog 
> <https://www.ddismart.com/ddi-blog/> | News 
> <https://www.ddismart.com/news-section/> | Contact Us 
> <https://www.ddismart.com/contact-ddi/> 
> 
> [image: DDi wishes for the New Year 2025] 
> 
> DISCLAIMER: This message is intended solely for the use of the individual 
> or entity to which it is addressed. If you are not the intended recipient, 
> you should not use, copy, alter, or disclose the contents of this message. 
> All information or opinions expressed in this message and/or any 
> attachments are those of the author and are not necessarily those of the 
> group companies. 
> 
>

 
 
 
Global Locations:
 
USA | UK | India | Singapore | Japan
 
*ISO 9001, 27001, 13485 Compliant
 
www.DDIsmart.com
 
About Us | Awards | Blog | News | Contact Us
 
 
 
  
 
 
 
DISCLAIMER: This message is intended solely for the use of the individual or 
entity to which it is addressed. If you are not the intended recipient, you 
should not use, copy, alter, or disclose the contents of this message. All 
information or opinions expressed in this message and/or any attachments are 
those of the author and are not necessarily those of the group companies.
 



Reply via email to