Hi,
I wanted to find out how to correct the issue below and will appreciate any help. I am trying to upgrade to Nutch 1.12. I am using solr 5.3.1. The reason I am upgrading are: 1: https crawling 2: Boilerplate canola extraction through tika The only problem so far I am having is an IOException. Please see below. I searched and there is an existing jira issue NUTCH-2269 <https://issues.apache.org/jira/browse/NUTCH-2269> [NUTCH-2269] Clean not working after crawl - ASF JIRA<https://issues.apache.org/jira/browse/NUTCH-2269> issues.apache.org It seems like the database on Lucene can only be called crawldb. However a couple of bundled version we can find online use linkdb for Lucene as default I get the same error if I try to clean via the old command: bin/nutch solrclean crawl-adc/crawldb http://localhost:8983/solr/nutch But cleaning through linkdb worked as said in the jira issue i.e. bin/nutch solrclean crawl-adc/linkdb http://localhost:8983/solr/nutch Just want to know if there is a fix or an alternate way of cleaning and if cleaning via linkdb might be okay or what are the repercussions of cleaning via linkdb. Exception from logs: java.lang.Exception: java.lang.IllegalStateException: Connection pool shut down at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) Caused by: java.lang.IllegalStateException: Connection pool shut down at org.apache.http.util.Asserts.check(Asserts.java:34) at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:169) at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:202) at org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:184) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57) at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:480) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:150) at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:483) at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:464) at org.apache.nutch.indexwriter.solr.SolrIndexWriter.commit(SolrIndexWriter.java:190) at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:178) at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:115) at org.apache.nutch.indexer.CleaningJob$DeleterReducer.close(CleaningJob.java:120) at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2016-08-16 15:27:47,794 ERROR indexer.CleaningJob - CleaningJob: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)