On Thursday 22 March 2012 14:10:41 webdev1977 wrote: > Thanks for the quick response Markus! > > How would that fit into this continuous crawling scenario (I am trying to > get the updates as quickly as possible into solr :-) > > If I am doing the generate --> fetch $SEGMENT --> parse $SEGMENT --> > updatedb crawldb $segment --> solrindex --> solrdedub cycle and i am > generating an "on the fly" segment and I just happen to be generating it > (and not done) when the updatedb command runs (changing it to the -dir > option), isn't that bad?
You can just fetch and parse that tiny segment and have it updated in the crawldb together with another segment. You don't have to update with only one segment. -dir is ok, but you can also list the segments. > Has anyone tested the mergedb command with potentially hundreds and > hundreds of dbs to merge (one per changed url)? I wouldn't try that. More scripting and locking horror and it's an I/O consumer. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/crawl-and-update-one-url-already-in-cra > wldb-tp3848358p3848423.html Sent from the Nutch - User mailing list archive > at Nabble.com. -- Markus Jelsma - CTO - Openindex