Going through the different URLs that are handled in protocol plugins there are 'directory' links and 'document' links. Assume a filesystem to consist of folders and files. Or an IMAP server that organizes emails (=documents) in folders.
Now Nutch seems to crawl URLs every 30 days. This may be good enough for big remote sites that change only every now and then. In my case the documents do not change too often so 30 days would be good enough. But if new files are added I would not like to wait up to a month for them to be indexed. Especially on emails I'd like to check every hour or so - although emails do not change that often so fetching them every 30 days or longer is absolutely ok. How can a protocol plugin define which delay should be applied to recrawl a URL? How can a protocol plugin know when the URL was fetched last time and prevent a new fetch if the resource was not modified since?

