Hi, JAVA_HEAP_MAX value can be modified in the bin/nutch script
Remi On Thu, Mar 20, 2014 at 11:11 PM, Vangelis karv <[email protected]>wrote: > I managed to crawl again but I have something else now: > > https://www.dropbox.com/s/853xf1evi8sb51v/error . > > Also, I found this : > 2014-03-20 14:04:33,885 INFO mapreduce.GoraRecordWriter - Flushing the > datastore after 20000 records. > > Thank you in advance! > > From: [email protected] > To: [email protected] > Subject: Java Heap Space error > Date: Thu, 20 Mar 2014 10:59:27 +0200 > > > > > Hello everybody! Yesterday, I tried to run a crawl at depth 5 and topN > 120000. In the middle of the 5th depth I got this error: > > 2014-03-19 19:16:11,608 WARN fetcher.FetcherJob - fetch of > http://www.weather.com/outlook/health/allergies/common/allergens/FL-allergen-716failed > with: java.lang.OutOfMemoryError: Java heap space > 2014-03-19 19:16:11,608 INFO fetcher.FetcherJob - fetching > http://www.weather.com/outlook/health/allergies/pollenalert/USCA9000(queue > crawl delay=0ms) > 2014-03-19 19:16:22,291 ERROR http.Http - Failed with the following error: > java.lang.OutOfMemoryError: Java heap space > 2014-03-19 19:16:24,677 INFO fetcher.FetcherJob - fetching > http://www.weather.com/outlook/recreation/outdoors/fishing/29547:21(queue > crawl delay=0ms) > 2014-03-19 19:16:24,677 WARN fetcher.FetcherJob - fetch of > http://www.weather.com/outlook/health/allergies/pollenalert/USCA9000failed > with: java.lang.OutOfMemoryError: Java heap space > 2014-03-19 19:16:33,550 ERROR http.Http - Failed with the following error: > java.lang.OutOfMemoryError: Java heap space > 2014-03-19 19:16:35,568 INFO fetcher.FetcherJob - fetching > http://www.weather.com/outlook/health/allergies/common/allergens/NV-allergen-1187(queue > crawl delay=0ms) > 2014-03-19 19:16:35,568 WARN fetcher.FetcherJob - fetch of > http://www.weather.com/outlook/recreation/outdoors/fishing/29547:21failed > with: java.lang.OutOfMemoryError: Java heap space > 2014-03-19 19:16:41,928 ERROR http.Http - Failed with the following error: > java.lang.OutOfMemoryError: Java heap space > 2014-03-19 19:16:43,535 INFO fetcher.FetcherJob - fetching > http://www.weather.com/outlook/health/allergies/common/allergens/OH-allergen-928(queue > crawl delay=0ms) > 2014-03-19 19:16:43,535 WARN fetcher.FetcherJob - fetch of > http://www.weather.com/outlook/health/allergies/common/allergens/NV-allergen-1187failed > with: java.lang.OutOfMemoryError: Java heap space > 2014-03-19 19:16:50,432 ERROR http.Http - Failed with the following error: > java.lang.OutOfMemoryError: Java heap space > 2014-03-19 19:16:50,888 WARN fetcher.FetcherJob - fetch of > http://www.weather.com/outlook/health/allergies/common/allergens/OH-allergen-928failed > with: java.lang.OutOfMemoryError: Java heap space > 2014-03-19 19:16:51,580 INFO fetcher.FetcherJob - fetching > http://www.weather.com/outlook/health/allergies/common/allergens/FL-allergen-235(queue > crawl delay=0ms) > 2014-03-19 19:16:53,120 ERROR http.Http - Failed with the following error: > 2014-03-19 19:16:53,711 INFO fetcher.FetcherJob - fetching > http://www.weather.com/outlook/recreation/outdoors/fishing/27891:21(queue > crawl delay=0ms) > 2014-03-19 19:16:54,659 INFO fetcher.FetcherJob - -finishing thread > FetcherThread20, activeThreads=46 > 2014-03-19 19:17:06,734 INFO fetcher.FetcherJob - -finishing thread > FetcherThread48, activeThreads=44 > 2014-03-19 19:17:08,348 ERROR http.Http - Failed with the following error: > java.lang.OutOfMemoryError: Java heap space > > As you can see, I have problems with the Java heap space. I ran this crawl > using Nutch 2.2.1, Eclipse and MySQL. > > Any ideas on how to solve this thing? > Recently, I changed metadata field from blob to longblob and put > http.content.limit to -1 (None of them caused any trouble so far though). > > >

