I have written my own plugin for Apache Nutch 2.2.1 to crawl images, videos
and podcasts from selected sites (I have 180 urls in my seed). I put this
metadata to a hBase store and now I want to save it to the index (Solr). I
have a lot of metadatas to save (webpages + images + videos + podcast).

I am using Nutch script bin/crawl for the whole process (inject, generate,
fetch, parse... and finally solrindex and dedup) but I have one problem.
When I run this script for a first time, there are stored approximately 6000
documents (Lets say it is 3700 docs for images, 1700 for wegpages and the
rest of docs are for videos and podcasts) to the index. It is ok...

but...

When I run the script for a second time, third time and so on... the index
does not increase the number of documents (there are still 6000 documents)
but a count of rows stored in hBase table grows (there is 97383 rows now)...

Do you now where is the problem please? I am fighting with this problem
really long time and I dont know... If it could be helpful, this is my
configuration of solrconfix.xml http://pastebin.com/uxMW2nuq and this is my
nutch-site.xml http://pastebin.com/4bj1wdmT



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Apache-Solr-4-after-1st-commit-the-index-does-not-grow-tp4077913.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to