Hi, I am using apache Nutch 1.7 to crawl and apache Solr 4.7.2 for indexing. In my tests there is a gap between number of fetched results of Nutch and number of indexed documents in Solr. For example one of the crawls is fetched 23343 pages and 1146 images successfully while in the Solr 19250 docs is indexed and 500 of them is image urls.
My question is that what kind of pages are indexed is solr and why? Does Solr index pages whit other status or not? what kind of images does Solr index? Thanks.