Aah, 10000 is too short as it is in milliseconds. Fetchers will probably also hang on sites without issues. Try setting it to two minutes instead.
Op do 2 jan 2025 om 17:12 schreef Raj Chidara <[email protected]>: > Hi Markus > Thanks for the response. I did not find any GC issues. I also increased > mapred.task.timeout to 10000. Still I have same issue. > > 2025-01-02 14:36:53,481 INFO o.a.n.f.Fetcher [LocalJobRunner Map Task > Executor #0] -activeThreads=1, spinWaiting=1, fetchQueues.totalSize=1, > fetchQueues.getQueueCount=1 > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueues [LocalJobRunner Map > Task Executor #0] * queue: www.titck.gov.tr > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map > Task Executor #0] maxThreads = 1 > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map > Task Executor #0] inProgress = 1 > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map > Task Executor #0] crawlDelay = 5000 > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map > Task Executor #0] minCrawlDelay = 0 > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map > Task Executor #0] nextFetchTime = 1735828612457 > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map > Task Executor #0] now = 1735828613481 > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map > Task Executor #0] 0. https://www.titck.gov.tr/ > 2025-01-02 14:36:53,481 WARN o.a.n.f.Fetcher [LocalJobRunner Map Task > Executor #0] Aborting with 1 hung threads. > 2025-01-02 14:36:53,481 WARN o.a.n.f.Fetcher [LocalJobRunner Map Task > Executor #0] Thread #0 hung while processing https://www.titck.gov.tr/ > 2025-01-02 14:36:53,536 WARN o.a.h.m.i.MetricsSystemImpl > [pool-55-thread-1] JobTracker metrics system already initialized! > 2025-01-02 14:36:54,389 INFO o.a.h.m.Job [main] map 100% reduce 100% > 2025-01-02 14:36:54,389 INFO o.a.h.m.Job [main] Job > job_local1014979377_0001 completed successfully > 2025-01-02 14:36:54,397 INFO o.a.h.m.Job [main] Counters: 31 > File System Counters > FILE: Number of bytes read=1717876 > FILE: Number of bytes written=3144478 > FILE: Number of read operations=0 > FILE: Number of large read operations=0 > FILE: Number of write operations=0 > Map-Reduce Framework > Map input records=1 > Map output records=0 > Map output bytes=0 > Map output materialized bytes=14 > Input split bytes=162 > Combine input records=0 > Combine output records=0 > Reduce input groups=0 > Reduce shuffle bytes=14 > Reduce input records=0 > Reduce output records=0 > Spilled Records=0 > Shuffled Maps =1 > Failed Shuffles=0 > Merged Map outputs=1 > GC time elapsed (ms)=0 > Total committed heap usage (bytes)=4299161600 > FetcherStatus > bytes_downloaded=0 > Shuffle Errors > BAD_ID=0 > CONNECTION=0 > IO_ERROR=0 > WRONG_LENGTH=0 > WRONG_MAP=0 > WRONG_REDUCE=0 > File Input Format Counters > Bytes Read=182 > File Output Format Counters > Bytes Written=564 > 2025-01-02 14:36:54,397 INFO o.a.n.f.Fetcher [main] Fetcher: finished at > 2025-01-02 14:36:54, elapsed: 00:05:03 > > > Thanks and Regards > Raj Chidara > > > > ---- On Thu, 02 Jan 2025 18:10:05 +0530 *Markus Jelsma > <[email protected] <[email protected]>>* wrote --- > > Hi Raj, > > I can't seem to find an issue crawling that site, but maybe your parser is > hanging. It is usually the case when 'hanging' threads are detected. You > can also increase -Dmapred.task.timeout=, it controls how long it waits > before giving up on hanging threads. > > Also check your logs, there can be a hint in there, such as a GC issue, or > whatever. > > Regards, > Markus > > Op wo 1 jan 2025 om 15:26 schreef Raj Chidara <[email protected]>: > > > Hi > > > > I have problem in crawling and fetching this site > > https://www.titck.gov.tr/ . It is either crawling same page again and > > again and some times I get an error that Thread #0 hung while processing > > https://www.titck.gov.tr/. Can you please help me out. > > > > Thanks and Regards > > > > Raj Chidara > > > > > > > > > > Global Locations: > > > > USA | UK | India | Singapore | Japan > > > > *ISO 9001, 27001, 13485 Compliant > > > > www.DDIsmart.com > > > > About Us | Awards | Blog | News | Contact Us > > > > > > > > > > > > > > > > DISCLAIMER: This message is intended solely for the use of the > individual > > or entity to which it is addressed. If you are not the intended > recipient, > > you should not use, copy, alter, or disclose the contents of this > message. > > All information or opinions expressed in this message and/or any > > attachments are those of the author and are not necessarily those of the > > group companies. > > > > > > > > > > > > > > > [image: DDi logo] <https://www.ddismart.com> > > *Global Locations:* > USA | UK | India | Singapore | Japan > *ISO 9001, 27001, 13485 Compliant > www.DDIsmart.com <https://www.ddismart.com> > About Us <https://www.ddismart.com/ddi-drug-development-informatics/> | > Awards <https://www.ddismart.com/ddi-awards-recognition/> | Blog > <https://www.ddismart.com/ddi-blog/> | News > <https://www.ddismart.com/news-section/> | Contact Us > <https://www.ddismart.com/contact-ddi/> > > [image: DDi wishes for the New Year 2025] > > DISCLAIMER: This message is intended solely for the use of the individual > or entity to which it is addressed. If you are not the intended recipient, > you should not use, copy, alter, or disclose the contents of this message. > All information or opinions expressed in this message and/or any > attachments are those of the author and are not necessarily those of the > group companies. > >

