Eyeris Rodriguez Rueda created NUTCH-2387: ---------------------------------------------
Summary: Nutch should not index document with "noindex" meta Key: NUTCH-2387 URL: https://issues.apache.org/jira/browse/NUTCH-2387 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.13 Environment: Linux mint 18, Reporter: Eyeris Rodriguez Rueda Fix For: 1.14 I'm using nutch 1.12 in local mode and solr 4.10.3. For some reason i have detected that nutch index document with "noindex" robots meta. I have use nutch script for a complete cycle: bin/crawl -i urls/ crawl/ -2 with this url: https://humanos.uci.cu/category/humanos/comparte-tu-software/page/3/ After various testing the problem persist and aproximately 200 document with this robots meta are indexed incorrectly. I have read the method configure in IndexerMapReduce.java class and it has a line for that property but for some reason it is not doing appropiately. this.deleteRobotsNoIndex = job.getBoolean(INDEXER_DELETE_ROBOTS_NOINDEX,false); (line 97) -- This message was sent by Atlassian JIRA (v6.3.15#6346)