I agree with Sebastian. It was a crawl in local mode and not over a
cluster. The intended crawl volume is huge and if we dont override the
default heap size to some decent value, there is high possibility of facing
an OOM.


On Sun, Mar 3, 2013 at 1:04 PM, kiran chitturi <chitturikira...@gmail.com>wrote:

> > If you find the time you should trace the process.
> > Seems to be either a misconfiguration or even a bug.
> >
> > I will try to track this down soon with the previous configuration. Right
> now, i am just trying to get data crawled by Monday.
>
> Kiran.
>
>
> > >> Luckily, you should be able to retry via "bin/nutch parse ..."
> > >> Then trace the system and the Java process to catch the reason.
> > >>
> > >> Sebastian
> > >>
> > >> On 03/02/2013 08:13 PM, kiran chitturi wrote:
> > >>> Sorry, i am looking to crawl 400k documents with the crawl. I said
> 400
> > in
> > >>> my last message.
> > >>>
> > >>>
> > >>> On Sat, Mar 2, 2013 at 2:12 PM, kiran chitturi <
> > >> chitturikira...@gmail.com>wrote:
> > >>>
> > >>>> Hi!
> > >>>>
> > >>>> I am running Nutch 1.6 on a 4 GB Mac OS desktop with Core i5 2.8GHz.
> > >>>>
> > >>>> Last night i started a crawl on local mode for 5 seeds with the
> config
> > >>>> given below. If the crawl goes well, it should fetch a total of 400
> > >>>> documents. The crawling is done on a single host that we own.
> > >>>>
> > >>>> Config
> > >>>> ---------------------
> > >>>>
> > >>>> fetcher.threads.per.queue - 2
> > >>>> fetcher.server.delay - 1
> > >>>> fetcher.throughput.threshold.pages - -1
> > >>>>
> > >>>> crawl script settings
> > >>>> ----------------------------
> > >>>> timeLimitFetch- 30
> > >>>> numThreads - 5
> > >>>> topN - 10000
> > >>>> mapred.child.java.opts=-Xmx1000m
> > >>>>
> > >>>>
> > >>>> I have noticed today that the crawl has stopped due to an error and
> i
> > >> have
> > >>>> found the below error in logs.
> > >>>>
> > >>>> 2013-03-01 21:45:03,767 INFO  parse.ParseSegment - Parsed (0ms):
> > >>>>> http://scholar.lib.vt.edu/ejournals/JARS/v33n3/v33n3-letcher.htm
> > >>>>> 2013-03-01 21:45:03,790 WARN  mapred.LocalJobRunner -
> job_local_0001
> > >>>>> java.lang.OutOfMemoryError: unable to create new native thread
> > >>>>>         at java.lang.Thread.start0(Native Method)
> > >>>>>         at java.lang.Thread.start(Thread.java:658)
> > >>>>>         at
> > >>>>>
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.addThread(ThreadPoolExecutor.java:681)
> > >>>>>         at
> > >>>>>
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.addIfUnderMaximumPoolSize(ThreadPoolExecutor.java:727)
> > >>>>>         at
> > >>>>>
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:655)
> > >>>>>         at
> > >>>>>
> > >>
> >
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:92)
> > >>>>>         at
> > >> org.apache.nutch.parse.ParseUtil.runParser(ParseUtil.java:159)
> > >>>>>         at
> > org.apache.nutch.parse.ParseUtil.parse(PaifrseUtil.java:93)
> > >>>>>         at
> > >> org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:97)
> > >>>>>         at
> > >> org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:44)
> > >>>>>         at
> org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> > >>>>>         at
> > >> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
> > >>>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> > >>>>>         at
> > >>>>>
> > >>
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> > >>>>> (END)
> > >>>>
> > >>>>
> > >>>>
> > >>>> Did anyone run in to the same issue ? I am not sure why the new
> native
> > >>>> thread is not being created. The link here says [0] that it might
> due
> > to
> > >>>> the limitation of number of processes in my OS. Will increase them
> > solve
> > >>>> the issue ?
> > >>>>
> > >>>>
> > >>>> [0] - http://ww2.cs.fsu.edu/~czhang/errors.html
> > >>>>
> > >>>> Thanks!
> > >>>>
> > >>>> --
> > >>>> Kiran Chitturi
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>
> > >>
> > >
> > >
> >
> >
>
>
> --
> Kiran Chitturi
>

Reply via email to