its a config specific to Hadoop. But when added to nutch-site, Hadoop will pick it up just fine.
Op di 7 jan 2025 om 15:01 schreef Raj Chidara <[email protected]>: > In conf, I did not find the property mapreduce.task.timeout. > > Thanks and Regards > Raj Chidara > > > > ---- On Tue, 07 Jan 2025 18:20:26 +0530 *Markus Jelsma > <[email protected] <[email protected]>>* wrote --- > > Im not sure if you are allowed to set -D options via bin/crawl like that. > You can add it to commands such as bin/nutch fetch -D blabla. > > Try setting it in the conf itself. > > Op di 7 jan 2025 om 14:44 schreef Raj Chidara <[email protected]>: > > Hi Markus > I increased timeout to 2 minutes. However problem is same. This is > command used. Please correct me if I am doing anything wrong in syntax > > bin/crawl -s urls -D mapreduce.task.timeout=120000 --size-fetchlist 100 > --num-tasks 1 crawl 3 > > Thanks and Regards > Raj Chidara > > > > ---- On Tue, 07 Jan 2025 17:38:28 +0530 *Markus Jelsma > <[email protected] <[email protected]>>* wrote --- > > Aah, 10000 is too short as it is in milliseconds. Fetchers will probably > also hang on sites without issues. Try setting it to two minutes instead. > > > > Op do 2 jan 2025 om 17:12 schreef Raj Chidara <[email protected]>: > > > Hi Markus > > Thanks for the response. I did not find any GC issues. I also increased > > mapred.task.timeout to 10000. Still I have same issue. > > > > 2025-01-02 14:36:53,481 INFO o.a.n.f.Fetcher [LocalJobRunner Map Task > > Executor #0] -activeThreads=1, spinWaiting=1, fetchQueues.totalSize=1, > > fetchQueues.getQueueCount=1 > > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueues [LocalJobRunner Map > > Task Executor #0] * queue: www.titck.gov.tr > > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map > > Task Executor #0] maxThreads = 1 > > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map > > Task Executor #0] inProgress = 1 > > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map > > Task Executor #0] crawlDelay = 5000 > > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map > > Task Executor #0] minCrawlDelay = 0 > > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map > > Task Executor #0] nextFetchTime = 1735828612457 > > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map > > Task Executor #0] now = 1735828613481 > > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map > > Task Executor #0] 0. https://www.titck.gov.tr/ > > 2025-01-02 14:36:53,481 WARN o.a.n.f.Fetcher [LocalJobRunner Map Task > > Executor #0] Aborting with 1 hung threads. > > 2025-01-02 14:36:53,481 WARN o.a.n.f.Fetcher [LocalJobRunner Map Task > > Executor #0] Thread #0 hung while processing https://www.titck.gov.tr/ > > 2025-01-02 14:36:53,536 WARN o.a.h.m.i.MetricsSystemImpl > > [pool-55-thread-1] JobTracker metrics system already initialized! > > 2025-01-02 14:36:54,389 INFO o.a.h.m.Job [main] map 100% reduce 100% > > 2025-01-02 14:36:54,389 INFO o.a.h.m.Job [main] Job > > job_local1014979377_0001 completed successfully > > 2025-01-02 14:36:54,397 INFO o.a.h.m.Job [main] Counters: 31 > > File System Counters > > FILE: Number of bytes read=1717876 > > FILE: Number of bytes written=3144478 > > FILE: Number of read operations=0 > > FILE: Number of large read operations=0 > > FILE: Number of write operations=0 > > Map-Reduce Framework > > Map input records=1 > > Map output records=0 > > Map output bytes=0 > > Map output materialized bytes=14 > > Input split bytes=162 > > Combine input records=0 > > Combine output records=0 > > Reduce input groups=0 > > Reduce shuffle bytes=14 > > Reduce input records=0 > > Reduce output records=0 > > Spilled Records=0 > > Shuffled Maps =1 > > Failed Shuffles=0 > > Merged Map outputs=1 > > GC time elapsed (ms)=0 > > Total committed heap usage (bytes)=4299161600 > > FetcherStatus > > bytes_downloaded=0 > > Shuffle Errors > > BAD_ID=0 > > CONNECTION=0 > > IO_ERROR=0 > > WRONG_LENGTH=0 > > WRONG_MAP=0 > > WRONG_REDUCE=0 > > File Input Format Counters > > Bytes Read=182 > > File Output Format Counters > > Bytes Written=564 > > 2025-01-02 14:36:54,397 INFO o.a.n.f.Fetcher [main] Fetcher: finished at > > 2025-01-02 14:36:54, elapsed: 00:05:03 > > > > > > Thanks and Regards > > Raj Chidara > > > > > > > > ---- On Thu, 02 Jan 2025 18:10:05 +0530 *Markus Jelsma > > <[email protected] <[email protected]>>* wrote --- > > > > Hi Raj, > > > > I can't seem to find an issue crawling that site, but maybe your parser > is > > hanging. It is usually the case when 'hanging' threads are detected. You > > can also increase -Dmapred.task.timeout=, it controls how long it waits > > before giving up on hanging threads. > > > > Also check your logs, there can be a hint in there, such as a GC issue, > or > > whatever. > > > > Regards, > > Markus > > > > Op wo 1 jan 2025 om 15:26 schreef Raj Chidara <[email protected]>: > > > > > > Hi > > > > > > I have problem in crawling and fetching this site > > > https://www.titck.gov.tr/ . It is either crawling same page again and > > > again and some times I get an error that Thread #0 hung while > processing > > > https://www.titck.gov.tr/. Can you please help me out. > > > > > > Thanks and Regards > > > > > > Raj Chidara > > > > > > > > > > > > > > > Global Locations: > > > > > > USA | UK | India | Singapore | Japan > > > > > > *ISO 9001, 27001, 13485 Compliant > > > > > > www.DDIsmart.com > > > > > > About Us | Awards | Blog | News | Contact Us > > > > > > > > > > > > > > > > > > > > > > > > DISCLAIMER: This message is intended solely for the use of the > > individual > > > or entity to which it is addressed. If you are not the intended > > recipient, > > > you should not use, copy, alter, or disclose the contents of this > > message. > > > All information or opinions expressed in this message and/or any > > > attachments are those of the author and are not necessarily those of > the > > > group companies. > > > > > > > > > > > > > > > > > > > > > > > > > [image: DDi logo] <https://www.ddismart.com> > > > > *Global Locations:* > > USA | UK | India | Singapore | Japan > > *ISO 9001, 27001, 13485 Compliant > > www.DDIsmart.com <https://www.ddismart.com> > > About Us <https://www.ddismart.com/ddi-drug-development-informatics/> | > > Awards <https://www.ddismart.com/ddi-awards-recognition/> | Blog > > <https://www.ddismart.com/ddi-blog/> | News > > <https://www.ddismart.com/news-section/> | Contact Us > > <https://www.ddismart.com/contact-ddi/> > > > > [image: DDi wishes for the New Year 2025] > > > > DISCLAIMER: This message is intended solely for the use of the > individual > > or entity to which it is addressed. If you are not the intended > recipient, > > you should not use, copy, alter, or disclose the contents of this > message. > > All information or opinions expressed in this message and/or any > > attachments are those of the author and are not necessarily those of the > > group companies. > > > > > > > > > [image: DDi logo] <https://www.ddismart.com> > > *Global Locations:* > USA | UK | India | Singapore | Japan > *ISO 9001, 27001, 13485 Compliant > www.DDIsmart.com <https://www.ddismart.com> > About Us <https://www.ddismart.com/ddi-drug-development-informatics/> | > Awards <https://www.ddismart.com/ddi-awards-recognition/> | Blog > <https://www.ddismart.com/ddi-blog/> | News > <https://www.ddismart.com/news-section/> | Contact Us > <https://www.ddismart.com/contact-ddi/> > > [image: DDi wishes for the New Year 2025] > > DISCLAIMER: This message is intended solely for the use of the individual > or entity to which it is addressed. If you are not the intended recipient, > you should not use, copy, alter, or disclose the contents of this message. > All information or opinions expressed in this message and/or any > attachments are those of the author and are not necessarily those of the > group companies. > > > > > [image: DDi logo] <https://www.ddismart.com> > > *Global Locations:* > USA | UK | India | Singapore | Japan > *ISO 9001, 27001, 13485 Compliant > www.DDIsmart.com <https://www.ddismart.com> > About Us <https://www.ddismart.com/ddi-drug-development-informatics/> | > Awards <https://www.ddismart.com/ddi-awards-recognition/> | Blog > <https://www.ddismart.com/ddi-blog/> | News > <https://www.ddismart.com/news-section/> | Contact Us > <https://www.ddismart.com/contact-ddi/> > > [image: DDi wishes for the New Year 2025] > > DISCLAIMER: This message is intended solely for the use of the individual > or entity to which it is addressed. If you are not the intended recipient, > you should not use, copy, alter, or disclose the contents of this message. > All information or opinions expressed in this message and/or any > attachments are those of the author and are not necessarily those of the > group companies. > >

