Thanks for your answer Remi! That is another issue. I run my crawls through eclipse and not through the standard script. I changed Run Configurations and added in Arguments tab/ VM Arguments this : -Xms512M -Xmx2048M.
> Date: Fri, 21 Mar 2014 17:12:21 +0800 > Subject: Re: Java Heap Space error > From: [email protected] > To: [email protected] > > Hi, > > JAVA_HEAP_MAX value can be modified in the bin/nutch script > > Remi > > > On Thu, Mar 20, 2014 at 11:11 PM, Vangelis karv > <[email protected]>wrote: > > > I managed to crawl again but I have something else now: > > > > https://www.dropbox.com/s/853xf1evi8sb51v/error . > > > > Also, I found this : > > 2014-03-20 14:04:33,885 INFO mapreduce.GoraRecordWriter - Flushing the > > datastore after 20000 records. > > > > Thank you in advance! > > > > From: [email protected] > > To: [email protected] > > Subject: Java Heap Space error > > Date: Thu, 20 Mar 2014 10:59:27 +0200 > > > > > > > > > > Hello everybody! Yesterday, I tried to run a crawl at depth 5 and topN > > 120000. In the middle of the 5th depth I got this error: > > > > 2014-03-19 19:16:11,608 WARN fetcher.FetcherJob - fetch of > > http://www.weather.com/outlook/health/allergies/common/allergens/FL-allergen-716failed > > with: java.lang.OutOfMemoryError: Java heap space > > 2014-03-19 19:16:11,608 INFO fetcher.FetcherJob - fetching > > http://www.weather.com/outlook/health/allergies/pollenalert/USCA9000(queue > > crawl delay=0ms) > > 2014-03-19 19:16:22,291 ERROR http.Http - Failed with the following error: > > java.lang.OutOfMemoryError: Java heap space > > 2014-03-19 19:16:24,677 INFO fetcher.FetcherJob - fetching > > http://www.weather.com/outlook/recreation/outdoors/fishing/29547:21(queue > > crawl delay=0ms) > > 2014-03-19 19:16:24,677 WARN fetcher.FetcherJob - fetch of > > http://www.weather.com/outlook/health/allergies/pollenalert/USCA9000failed > > with: java.lang.OutOfMemoryError: Java heap space > > 2014-03-19 19:16:33,550 ERROR http.Http - Failed with the following error: > > java.lang.OutOfMemoryError: Java heap space > > 2014-03-19 19:16:35,568 INFO fetcher.FetcherJob - fetching > > http://www.weather.com/outlook/health/allergies/common/allergens/NV-allergen-1187(queue > > crawl delay=0ms) > > 2014-03-19 19:16:35,568 WARN fetcher.FetcherJob - fetch of > > http://www.weather.com/outlook/recreation/outdoors/fishing/29547:21failed > > with: java.lang.OutOfMemoryError: Java heap space > > 2014-03-19 19:16:41,928 ERROR http.Http - Failed with the following error: > > java.lang.OutOfMemoryError: Java heap space > > 2014-03-19 19:16:43,535 INFO fetcher.FetcherJob - fetching > > http://www.weather.com/outlook/health/allergies/common/allergens/OH-allergen-928(queue > > crawl delay=0ms) > > 2014-03-19 19:16:43,535 WARN fetcher.FetcherJob - fetch of > > http://www.weather.com/outlook/health/allergies/common/allergens/NV-allergen-1187failed > > with: java.lang.OutOfMemoryError: Java heap space > > 2014-03-19 19:16:50,432 ERROR http.Http - Failed with the following error: > > java.lang.OutOfMemoryError: Java heap space > > 2014-03-19 19:16:50,888 WARN fetcher.FetcherJob - fetch of > > http://www.weather.com/outlook/health/allergies/common/allergens/OH-allergen-928failed > > with: java.lang.OutOfMemoryError: Java heap space > > 2014-03-19 19:16:51,580 INFO fetcher.FetcherJob - fetching > > http://www.weather.com/outlook/health/allergies/common/allergens/FL-allergen-235(queue > > crawl delay=0ms) > > 2014-03-19 19:16:53,120 ERROR http.Http - Failed with the following error: > > 2014-03-19 19:16:53,711 INFO fetcher.FetcherJob - fetching > > http://www.weather.com/outlook/recreation/outdoors/fishing/27891:21(queue > > crawl delay=0ms) > > 2014-03-19 19:16:54,659 INFO fetcher.FetcherJob - -finishing thread > > FetcherThread20, activeThreads=46 > > 2014-03-19 19:17:06,734 INFO fetcher.FetcherJob - -finishing thread > > FetcherThread48, activeThreads=44 > > 2014-03-19 19:17:08,348 ERROR http.Http - Failed with the following error: > > java.lang.OutOfMemoryError: Java heap space > > > > As you can see, I have problems with the Java heap space. I ran this crawl > > using Nutch 2.2.1, Eclipse and MySQL. > > > > Any ideas on how to solve this thing? > > Recently, I changed metadata field from blob to longblob and put > > http.content.limit to -1 (None of them caused any trouble so far though). > > > > > >

