[
https://issues.apache.org/jira/browse/NUTCH-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353191#comment-14353191
]
Mo Omer commented on NUTCH-1948:
Yo Lewis,
In addition to being able to configure the
[
https://issues.apache.org/jira/browse/NUTCH-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353537#comment-14353537
]
Ashwini Tokekar commented on NUTCH-1936:
Hi Lewis,
I am interested in this
Dear Wiki user,
You have subscribed to a wiki page or wiki category on Nutch Wiki for change
notification.
The CommonCrawlDataDumper page has been changed by GiuseppeTotaro:
https://wiki.apache.org/nutch/CommonCrawlDataDumper
New page:
The CommonCrawlDataDumper is a Nutch tool able to dump out
Recently in the search app we are working on we've encountered a lot of
websites that have a wrong and invalid date in the Last Modified HTTP header,
meaning for instance that an article posted on a news site back in 2010 has a
Las Modified header of just a few days back, this could be for any
4 matches
Mail list logo