On Fri, Jun 21, 2013 at 7:07 PM, Joe Zhang <smartag...@gmail.com> wrote:
> Sorry, Nutch is certainly aware of page modification, and it does capture > lastModified. Nutch does captures the "last modified" field but I am not sure if its value is used ahead. I remember that it was not being used for any logic in older versions but need to confirm if the code is modified to take that into account. The real question is, can nutch get lastModified of a page > before fetching, and use it to make fetching decisions (e.g,, whether or > not to override the default interval)? > No. Nutch won't lookup for the lastModified of a page before fetching its content. > > > On Fri, Jun 21, 2013 at 6:27 PM, Joe Zhang <smartag...@gmail.com> wrote: > > > If I don't change the default value of db.fetch.interval.default, which > is > > 30 days, does it mean that the URL in the db won't be refetched before > the > > due time even if it has been modified? In other words, is Nutch aware of > > page modification? > > >