Hi,

JAVA_HEAP_MAX value can be modified in the bin/nutch script

Remi


On Thu, Mar 20, 2014 at 11:11 PM, Vangelis karv <[email protected]>wrote:

> I managed to crawl again but I have something else now:
>
> https://www.dropbox.com/s/853xf1evi8sb51v/error .
>
> Also, I found this :
> 2014-03-20 14:04:33,885 INFO  mapreduce.GoraRecordWriter - Flushing the
> datastore after 20000 records.
>
> Thank you in advance!
>
> From: [email protected]
> To: [email protected]
> Subject: Java Heap Space error
> Date: Thu, 20 Mar 2014 10:59:27 +0200
>
>
>
>
> Hello everybody! Yesterday, I tried to run a crawl at depth 5 and topN
> 120000. In the middle of the 5th depth I got this error:
>
> 2014-03-19 19:16:11,608 WARN  fetcher.FetcherJob - fetch of
> http://www.weather.com/outlook/health/allergies/common/allergens/FL-allergen-716failed
>  with: java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:11,608 INFO  fetcher.FetcherJob - fetching
> http://www.weather.com/outlook/health/allergies/pollenalert/USCA9000(queue 
> crawl delay=0ms)
> 2014-03-19 19:16:22,291 ERROR http.Http - Failed with the following error:
> java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:24,677 INFO  fetcher.FetcherJob - fetching
> http://www.weather.com/outlook/recreation/outdoors/fishing/29547:21(queue 
> crawl delay=0ms)
> 2014-03-19 19:16:24,677 WARN  fetcher.FetcherJob - fetch of
> http://www.weather.com/outlook/health/allergies/pollenalert/USCA9000failed 
> with: java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:33,550 ERROR http.Http - Failed with the following error:
> java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:35,568 INFO  fetcher.FetcherJob - fetching
> http://www.weather.com/outlook/health/allergies/common/allergens/NV-allergen-1187(queue
>  crawl delay=0ms)
> 2014-03-19 19:16:35,568 WARN  fetcher.FetcherJob - fetch of
> http://www.weather.com/outlook/recreation/outdoors/fishing/29547:21failed 
> with: java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:41,928 ERROR http.Http - Failed with the following error:
> java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:43,535 INFO  fetcher.FetcherJob - fetching
> http://www.weather.com/outlook/health/allergies/common/allergens/OH-allergen-928(queue
>  crawl delay=0ms)
> 2014-03-19 19:16:43,535 WARN  fetcher.FetcherJob - fetch of
> http://www.weather.com/outlook/health/allergies/common/allergens/NV-allergen-1187failed
>  with: java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:50,432 ERROR http.Http - Failed with the following error:
> java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:50,888 WARN  fetcher.FetcherJob - fetch of
> http://www.weather.com/outlook/health/allergies/common/allergens/OH-allergen-928failed
>  with: java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:51,580 INFO  fetcher.FetcherJob - fetching
> http://www.weather.com/outlook/health/allergies/common/allergens/FL-allergen-235(queue
>  crawl delay=0ms)
> 2014-03-19 19:16:53,120 ERROR http.Http - Failed with the following error:
> 2014-03-19 19:16:53,711 INFO  fetcher.FetcherJob - fetching
> http://www.weather.com/outlook/recreation/outdoors/fishing/27891:21(queue 
> crawl delay=0ms)
> 2014-03-19 19:16:54,659 INFO  fetcher.FetcherJob - -finishing thread
> FetcherThread20, activeThreads=46
> 2014-03-19 19:17:06,734 INFO  fetcher.FetcherJob - -finishing thread
> FetcherThread48, activeThreads=44
> 2014-03-19 19:17:08,348 ERROR http.Http - Failed with the following error:
> java.lang.OutOfMemoryError: Java heap space
>
> As you can see, I have problems with the Java heap space. I ran this crawl
> using Nutch 2.2.1, Eclipse and MySQL.
>
> Any ideas on how to solve this thing?
> Recently, I changed metadata field from blob to longblob and put
> http.content.limit to -1 (None of them caused any trouble so far though).
>
>
>

Reply via email to