Aah, 10000 is too short as it is in milliseconds. Fetchers will probably
also hang on sites without issues. Try setting it to two minutes instead.



Op do 2 jan 2025 om 17:12 schreef Raj Chidara <[email protected]>:

> Hi Markus
> Thanks for the response.  I did not find any GC issues.  I also increased
> mapred.task.timeout to 10000.  Still I have same issue.
>
> 2025-01-02 14:36:53,481 INFO o.a.n.f.Fetcher [LocalJobRunner Map Task
> Executor #0] -activeThreads=1, spinWaiting=1, fetchQueues.totalSize=1,
> fetchQueues.getQueueCount=1
> 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueues [LocalJobRunner Map
> Task Executor #0] * queue: www.titck.gov.tr
> 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map
> Task Executor #0]   maxThreads    = 1
> 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map
> Task Executor #0]   inProgress    = 1
> 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map
> Task Executor #0]   crawlDelay    = 5000
> 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map
> Task Executor #0]   minCrawlDelay = 0
> 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map
> Task Executor #0]   nextFetchTime = 1735828612457
> 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map
> Task Executor #0]   now           = 1735828613481
> 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner Map
> Task Executor #0]   0. https://www.titck.gov.tr/
> 2025-01-02 14:36:53,481 WARN o.a.n.f.Fetcher [LocalJobRunner Map Task
> Executor #0] Aborting with 1 hung threads.
> 2025-01-02 14:36:53,481 WARN o.a.n.f.Fetcher [LocalJobRunner Map Task
> Executor #0] Thread #0 hung while processing https://www.titck.gov.tr/
> 2025-01-02 14:36:53,536 WARN o.a.h.m.i.MetricsSystemImpl
> [pool-55-thread-1] JobTracker metrics system already initialized!
> 2025-01-02 14:36:54,389 INFO o.a.h.m.Job [main]  map 100% reduce 100%
> 2025-01-02 14:36:54,389 INFO o.a.h.m.Job [main] Job
> job_local1014979377_0001 completed successfully
> 2025-01-02 14:36:54,397 INFO o.a.h.m.Job [main] Counters: 31
> File System Counters
> FILE: Number of bytes read=1717876
> FILE: Number of bytes written=3144478
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> Map-Reduce Framework
> Map input records=1
> Map output records=0
> Map output bytes=0
> Map output materialized bytes=14
> Input split bytes=162
> Combine input records=0
> Combine output records=0
> Reduce input groups=0
> Reduce shuffle bytes=14
> Reduce input records=0
> Reduce output records=0
> Spilled Records=0
> Shuffled Maps =1
> Failed Shuffles=0
> Merged Map outputs=1
> GC time elapsed (ms)=0
> Total committed heap usage (bytes)=4299161600
> FetcherStatus
> bytes_downloaded=0
> Shuffle Errors
> BAD_ID=0
> CONNECTION=0
> IO_ERROR=0
> WRONG_LENGTH=0
> WRONG_MAP=0
> WRONG_REDUCE=0
> File Input Format Counters
> Bytes Read=182
> File Output Format Counters
> Bytes Written=564
> 2025-01-02 14:36:54,397 INFO o.a.n.f.Fetcher [main] Fetcher: finished at
> 2025-01-02 14:36:54, elapsed: 00:05:03
>
>
> Thanks and Regards
> Raj Chidara
>
>
>
> ---- On Thu, 02 Jan 2025 18:10:05 +0530 *Markus Jelsma
> <[email protected] <[email protected]>>* wrote ---
>
> Hi Raj,
>
> I can't seem to find an issue crawling that site, but maybe your parser is
> hanging. It is usually the case when 'hanging' threads are detected. You
> can also increase -Dmapred.task.timeout=, it controls how long it waits
> before giving up on hanging threads.
>
> Also check your logs, there can be a hint in there, such as a GC issue, or
> whatever.
>
> Regards,
> Markus
>
> Op wo 1 jan 2025 om 15:26 schreef Raj Chidara <[email protected]>:
>
> > Hi
> >
> > I have problem in crawling and fetching this site
> > https://www.titck.gov.tr/ . It is either crawling same page again and
> > again and some times I get an error that Thread #0 hung while processing
> > https://www.titck.gov.tr/. Can you please help me out.
> >
> > Thanks and Regards
> >
> > Raj Chidara
> >
> >
> >
> >
> > Global Locations:
> >
> > USA | UK | India | Singapore | Japan
> >
> > *ISO 9001, 27001, 13485 Compliant
> >
> > www.DDIsmart.com
> >
> > About Us | Awards | Blog | News | Contact Us
> >
> >
> >
> >
> >
> >
> >
> > DISCLAIMER: This message is intended solely for the use of the
> individual
> > or entity to which it is addressed. If you are not the intended
> recipient,
> > you should not use, copy, alter, or disclose the contents of this
> message.
> > All information or opinions expressed in this message and/or any
> > attachments are those of the author and are not necessarily those of the
> > group companies.
> >
> >
> >
> >
> >
>
>
>
>
> [image: DDi logo] <https://www.ddismart.com>
>
> *Global Locations:*
> USA | UK | India | Singapore | Japan
> *ISO 9001, 27001, 13485 Compliant
> www.DDIsmart.com <https://www.ddismart.com>
> About Us <https://www.ddismart.com/ddi-drug-development-informatics/> |
> Awards <https://www.ddismart.com/ddi-awards-recognition/> | Blog
> <https://www.ddismart.com/ddi-blog/> | News
> <https://www.ddismart.com/news-section/> | Contact Us
> <https://www.ddismart.com/contact-ddi/>
>
> [image: DDi wishes for the New Year 2025]
>
> DISCLAIMER: This message is intended solely for the use of the individual
> or entity to which it is addressed. If you are not the intended recipient,
> you should not use, copy, alter, or disclose the contents of this message.
> All information or opinions expressed in this message and/or any
> attachments are those of the author and are not necessarily those of the
> group companies.
>
>

Reply via email to