Ok, then try passing it through the parserchecker tool:
bin/nutch parsechecker https://www.titck.gov.tr/

Op wo 8 jan 2025 om 12:53 schreef Raj Chidara <[email protected]>:

> I have added that to nutch-site.  However, getting same issue of hung
> thread.
>
> Thanks and Regards
> Raj Chidara
>
>
>
> ---- On Tue, 07 Jan 2025 18:36:55 +0530 *Markus Jelsma
> <[email protected] <[email protected]>>* wrote ---
>
> its a config specific to Hadoop. But when added to nutch-site, Hadoop will
> pick it up just fine.
>
> Op di 7 jan 2025 om 15:01 schreef Raj Chidara <[email protected]>:
>
> > In conf, I did not find the property mapreduce.task.timeout.
> >
> > Thanks and Regards
> > Raj Chidara
> >
> >
> >
> > ---- On Tue, 07 Jan 2025 18:20:26 +0530 *Markus Jelsma
> > <[email protected] <[email protected]>>* wrote ---
> >
> > Im not sure if you are allowed to set -D options via bin/crawl like
> that.
> > You can add it to commands such as bin/nutch fetch -D blabla.
> >
> > Try setting it in the conf itself.
> >
> > Op di 7 jan 2025 om 14:44 schreef Raj Chidara <[email protected]>:
>
> >
> > Hi Markus
> > I increased timeout to 2 minutes. However problem is same. This is
> > command used. Please correct me if I am doing anything wrong in syntax
> >
> > bin/crawl -s urls -D mapreduce.task.timeout=120000 --size-fetchlist 100
> > --num-tasks 1 crawl 3
> >
> > Thanks and Regards
> > Raj Chidara
> >
> >
> >
> > ---- On Tue, 07 Jan 2025 17:38:28 +0530 *Markus Jelsma
> > <[email protected] <[email protected]>>* wrote ---
> >
> > Aah, 10000 is too short as it is in milliseconds. Fetchers will probably
> > also hang on sites without issues. Try setting it to two minutes
> instead.
> >
> >
> >
> > Op do 2 jan 2025 om 17:12 schreef Raj Chidara <[email protected]>:
>
> >
> > > Hi Markus
> > > Thanks for the response. I did not find any GC issues. I also
> increased
> > > mapred.task.timeout to 10000. Still I have same issue.
> > >
> > > 2025-01-02 14:36:53,481 INFO o.a.n.f.Fetcher [LocalJobRunner Map Task
> > > Executor #0] -activeThreads=1, spinWaiting=1, fetchQueues.totalSize=1,
> > > fetchQueues.getQueueCount=1
> > > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueues [LocalJobRunner
> Map
> > > Task Executor #0] * queue: www.titck.gov.tr
> > > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner
> Map
> > > Task Executor #0] maxThreads = 1
> > > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner
> Map
> > > Task Executor #0] inProgress = 1
> > > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner
> Map
> > > Task Executor #0] crawlDelay = 5000
> > > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner
> Map
> > > Task Executor #0] minCrawlDelay = 0
> > > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner
> Map
> > > Task Executor #0] nextFetchTime = 1735828612457
> > > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner
> Map
> > > Task Executor #0] now = 1735828613481
> > > 2025-01-02 14:36:53,481 INFO o.a.n.f.FetchItemQueue [LocalJobRunner
> Map
> > > Task Executor #0] 0. https://www.titck.gov.tr/
> > > 2025-01-02 14:36:53,481 WARN o.a.n.f.Fetcher [LocalJobRunner Map Task
> > > Executor #0] Aborting with 1 hung threads.
> > > 2025-01-02 14:36:53,481 WARN o.a.n.f.Fetcher [LocalJobRunner Map Task
> > > Executor #0] Thread #0 hung while processing https://www.titck.gov.tr/
> > > 2025-01-02 14:36:53,536 WARN o.a.h.m.i.MetricsSystemImpl
> > > [pool-55-thread-1] JobTracker metrics system already initialized!
> > > 2025-01-02 14:36:54,389 INFO o.a.h.m.Job [main] map 100% reduce 100%
> > > 2025-01-02 14:36:54,389 INFO o.a.h.m.Job [main] Job
> > > job_local1014979377_0001 completed successfully
> > > 2025-01-02 14:36:54,397 INFO o.a.h.m.Job [main] Counters: 31
> > > File System Counters
> > > FILE: Number of bytes read=1717876
> > > FILE: Number of bytes written=3144478
> > > FILE: Number of read operations=0
> > > FILE: Number of large read operations=0
> > > FILE: Number of write operations=0
> > > Map-Reduce Framework
> > > Map input records=1
> > > Map output records=0
> > > Map output bytes=0
> > > Map output materialized bytes=14
> > > Input split bytes=162
> > > Combine input records=0
> > > Combine output records=0
> > > Reduce input groups=0
> > > Reduce shuffle bytes=14
> > > Reduce input records=0
> > > Reduce output records=0
> > > Spilled Records=0
> > > Shuffled Maps =1
> > > Failed Shuffles=0
> > > Merged Map outputs=1
> > > GC time elapsed (ms)=0
> > > Total committed heap usage (bytes)=4299161600
> > > FetcherStatus
> > > bytes_downloaded=0
> > > Shuffle Errors
> > > BAD_ID=0
> > > CONNECTION=0
> > > IO_ERROR=0
> > > WRONG_LENGTH=0
> > > WRONG_MAP=0
> > > WRONG_REDUCE=0
> > > File Input Format Counters
> > > Bytes Read=182
> > > File Output Format Counters
> > > Bytes Written=564
> > > 2025-01-02 14:36:54,397 INFO o.a.n.f.Fetcher [main] Fetcher: finished
> at
> > > 2025-01-02 14:36:54, elapsed: 00:05:03
> > >
> > >
> > > Thanks and Regards
> > > Raj Chidara
> > >
> > >
> > >
> > > ---- On Thu, 02 Jan 2025 18:10:05 +0530 *Markus Jelsma
> > > <[email protected] <[email protected]>>* wrote ---
> > >
> > > Hi Raj,
> > >
> > > I can't seem to find an issue crawling that site, but maybe your
> parser
> > is
> > > hanging. It is usually the case when 'hanging' threads are detected.
> You
> > > can also increase -Dmapred.task.timeout=, it controls how long it
> waits
> > > before giving up on hanging threads.
> > >
> > > Also check your logs, there can be a hint in there, such as a GC
> issue,
> > or
> > > whatever.
> > >
> > > Regards,
> > > Markus
> > >
> > > Op wo 1 jan 2025 om 15:26 schreef Raj Chidara <
> [email protected]>:
> >
> > >
> > > > Hi
> > > >
> > > > I have problem in crawling and fetching this site
> > > > https://www.titck.gov.tr/ . It is either crawling same page again
> and
> > > > again and some times I get an error that Thread #0 hung while
> > processing
> > > > https://www.titck.gov.tr/. Can you please help me out.
> > > >
> > > > Thanks and Regards
> > > >
> > > > Raj Chidara
> > > >
> > > >
> > > >
> > > >
> > > > Global Locations:
> > > >
> > > > USA | UK | India | Singapore | Japan
> > > >
> > > > *ISO 9001, 27001, 13485 Compliant
> > > >
> > > > www.DDIsmart.com
> > > >
> > > > About Us | Awards | Blog | News | Contact Us
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > DISCLAIMER: This message is intended solely for the use of the
> > > individual
> > > > or entity to which it is addressed. If you are not the intended
> > > recipient,
> > > > you should not use, copy, alter, or disclose the contents of this
> > > message.
> > > > All information or opinions expressed in this message and/or any
> > > > attachments are those of the author and are not necessarily those of
> > the
> > > > group companies.
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> > >
> > >
> > > [image: DDi logo] <https://www.ddismart.com>
> > >
> > > *Global Locations:*
> > > USA | UK | India | Singapore | Japan
> > > *ISO 9001, 27001, 13485 Compliant
> > > www.DDIsmart.com <https://www.ddismart.com>
> > > About Us <https://www.ddismart.com/ddi-drug-development-informatics/>
> |
> > > Awards <https://www.ddismart.com/ddi-awards-recognition/> | Blog
> > > <https://www.ddismart.com/ddi-blog/> | News
> > > <https://www.ddismart.com/news-section/> | Contact Us
> > > <https://www.ddismart.com/contact-ddi/>
> > >
> > > [image: DDi wishes for the New Year 2025]
> > >
> > > DISCLAIMER: This message is intended solely for the use of the
> > individual
> > > or entity to which it is addressed. If you are not the intended
> > recipient,
> > > you should not use, copy, alter, or disclose the contents of this
> > message.
> > > All information or opinions expressed in this message and/or any
> > > attachments are those of the author and are not necessarily those of
> the
> > > group companies.
> > >
> > >
> >
> >
> >
> >
> > [image: DDi logo] <https://www.ddismart.com>
> >
> > *Global Locations:*
> > USA | UK | India | Singapore | Japan
> > *ISO 9001, 27001, 13485 Compliant
> > www.DDIsmart.com <https://www.ddismart.com>
> > About Us <https://www.ddismart.com/ddi-drug-development-informatics/> |
> > Awards <https://www.ddismart.com/ddi-awards-recognition/> | Blog
> > <https://www.ddismart.com/ddi-blog/> | News
> > <https://www.ddismart.com/news-section/> | Contact Us
> > <https://www.ddismart.com/contact-ddi/>
> >
> > [image: DDi wishes for the New Year 2025]
> >
> > DISCLAIMER: This message is intended solely for the use of the
> individual
> > or entity to which it is addressed. If you are not the intended
> recipient,
> > you should not use, copy, alter, or disclose the contents of this
> message.
> > All information or opinions expressed in this message and/or any
> > attachments are those of the author and are not necessarily those of the
> > group companies.
> >
> >
> >
> >
> > [image: DDi logo] <https://www.ddismart.com>
> >
> > *Global Locations:*
> > USA | UK | India | Singapore | Japan
> > *ISO 9001, 27001, 13485 Compliant
> > www.DDIsmart.com <https://www.ddismart.com>
> > About Us <https://www.ddismart.com/ddi-drug-development-informatics/> |
> > Awards <https://www.ddismart.com/ddi-awards-recognition/> | Blog
> > <https://www.ddismart.com/ddi-blog/> | News
> > <https://www.ddismart.com/news-section/> | Contact Us
> > <https://www.ddismart.com/contact-ddi/>
> >
> > [image: DDi wishes for the New Year 2025]
> >
> > DISCLAIMER: This message is intended solely for the use of the
> individual
> > or entity to which it is addressed. If you are not the intended
> recipient,
> > you should not use, copy, alter, or disclose the contents of this
> message.
> > All information or opinions expressed in this message and/or any
> > attachments are those of the author and are not necessarily those of the
> > group companies.
> >
> >
>
>
>
>
> [image: DDi logo] <https://www.ddismart.com>
>
> *Global Locations:*
> USA | UK | India | Singapore | Japan
> *ISO 9001, 27001, 13485 Compliant
> www.DDIsmart.com <https://www.ddismart.com>
> About Us <https://www.ddismart.com/ddi-drug-development-informatics/> |
> Awards <https://www.ddismart.com/ddi-awards-recognition/> | Blog
> <https://www.ddismart.com/ddi-blog/> | News
> <https://www.ddismart.com/news-section/> | Contact Us
> <https://www.ddismart.com/contact-ddi/>
>
> [image: DDi wishes for the New Year 2025]
>
> DISCLAIMER: This message is intended solely for the use of the individual
> or entity to which it is addressed. If you are not the intended recipient,
> you should not use, copy, alter, or disclose the contents of this message.
> All information or opinions expressed in this message and/or any
> attachments are those of the author and are not necessarily those of the
> group companies.
>
>

Reply via email to