On Tuesday 15 May 2012 17:39:31 Vikas Hazrati wrote: > So once the crawl (which abstracts iterative crawls till the depth is > reached) is finished, is there a way to trigger a recrawl as well as a part > of some command line option so that Nutch continues to run as a daemon or > is shell script the way out?
shell scripting is the way to go. Nutch will automatically recrawl pages that are due to be refetched. > > Regards | Vikas > > On Fri, May 11, 2012 at 8:26 PM, Lewis John Mcgibbney < > > lewis.mcgibb...@gmail.com> wrote: > > If you would like I could add you to the moderators group and you can > > word it how you wish. > > > > Please sign up to Jira, give me your Jira username on this page, and I > > will happily add you the the group. > > > > On the other-hand, if you don't wish to do this, then please reply > > here with your suggestion and I'll make sure something gets changed to > > accommodate your suggestions. > > > > Thanks > > > > On Fri, May 11, 2012 at 2:52 PM, Matthias Paul <magethle.nu...@gmail.com> > > > > wrote: > > > In was confused by this tutorial: > > http://wiki.apache.org/nutch/NutchTutorial > > > > > Reading this page one might get to the conclusion that the crawl tool > > > can't do iterative crawling, because under "3.2 Using Individual > > > Commands for Whole-Web Crawling" there's the sentence "This also > > > permits ... incremental crawling", as if the crawl command described > > > before (3.1 Using the Crawl Command) couldn't do that. > > > > > > Could someone perhaps improve this part of the tutorial? > > > > > > Matthias > > > > > > > > > > > > > > > > > > > > > On Thu, May 10, 2012 at 8:39 PM, Markus Jelsma > > > > > > <markus.jel...@openindex.io> wrote: > > >> By default each crawl is iterative. The crawl command is nothing more > > > > than a wrapper around the individual crawl cycle commands. The depth > > parameter is nothing more than executing a single crawl cycle multiple > > times. This is, if i am not mistaken, also true for older releases, > > certainly 1.2 and above. > > > > >> On Thu, 10 May 2012 19:31:27 +0100, Lewis John Mcgibbney < > > > > lewis.mcgibb...@gmail.com> wrote: > > >>> For the record, there is a patch pending review for Nutchgora which > > >>> will sort part of this for you as well. > > >>> > > >>> https://issues.apache.org/jira/browse/NUTCH-1301 > > >>> > > >>> Susam Pal also contributed a patch for Nutchgora regarding incremental > > >>> indexing but I can't find it just now sorry. > > >>> > > >>> Lewis > > >>> > > >>> > > >>> On Thu, May 10, 2012 at 5:18 PM, Matthias Paul > > >>> > > >>> <magethle.nu...@gmail.com> wrote: > > >>>> Hi all, > > >>>> > > >>>> can the crawl-command also be used for iterative crawls? > > >>>> In older Nutch-versions this was not possible but in 1.5 it seems to > > > > work? > > > > >>>> Thanks > > >>>> Matthias > > >> > > >> -- > > >> Markus Jelsma - CTO - Openindex > > > > -- > > Lewis -- Markus Jelsma - CTO - Openindex