Thank you Sebastian. I added the run-time parameters and the output is identical. I am not seeing the http status codes though??
The log file shows: 2019-12-17 15:37:36,602 INFO parse.ParserChecker - fetching: https://www.avalonpontoons.com/ 2019-12-17 15:37:36,872 INFO protocol.RobotRulesParser - robots.txt whitelist not configured. 2019-12-17 15:37:36,872 INFO http.Http - http.proxy.host = null 2019-12-17 15:37:36,872 INFO http.Http - http.proxy.port = 8080 2019-12-17 15:37:36,873 INFO http.Http - http.proxy.exception.list = false 2019-12-17 15:37:36,873 INFO http.Http - http.timeout = 10000 2019-12-17 15:37:36,873 INFO http.Http - http.content.limit = -1 2019-12-17 15:37:36,873 INFO http.Http - http.agent = FFDevBot/Nutch-1.14 ( fourfront.us) 2019-12-17 15:37:36,873 INFO http.Http - http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3 2019-12-17 15:37:36,873 INFO http.Http - http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 2019-12-17 15:37:36,873 INFO http.Http - http.enable.cookie.header = true the command line shows: >$NUTCHl/bin/nutch parsechecker -Dstore.http.headers=true -Dstore.http.request=true https://www.avalonpontoons.com/ fetching: https://www.avalonpontoons.com/ robots.txt whitelist not configured. Fetch failed with protocol status: gone(11), lastModified=0: https://www.avalonpontoons.com/ On Tue, Dec 17, 2019 at 11:53 AM Sebastian Nagel <[email protected]> wrote: > Hi Bob, > > the relevant Javadoc comment stands before the declaration of a variable > (here a constant): > /** Resource is gone. */ > public static final int GONE = 11; > > More detailed, GONE results from one of the following HTTP status codes: > 400 Bad request > 401 Unauthorized > 410 Gone (*forever* gone, opposed to 404 Not Found) > See > src/plugin/lib-http/src/java/org/apache/nutch/protocol/http/api/HttpBase.java > > My guess would be that "www.sitename.com" requires authentication. > > Just repeat the request as > bin/nutch parsechecker \ > -Dstore.http.headers=true \ > -Dstore.http.request=true \ > ... <url> > > (I guess you're already using parsechecker or indexchecker) > This will show the HTTP headers where you'll find the exact HTTP status > code. > > Best, > Sebastian > > > > On 12/17/19 4:36 PM, Robert Scavilla wrote: > > Hi again, and thank in advance for your kind help. > > > > Nutch 1.14 > > > > I am getting the following error message when crawling a site: > > *Fetch failed with protocol status: gone(11), lastModified=0: > > https://www.sitename.com <https://www.sitename.com>* > > > > The only documentation I can find says: > > > >> public static final int GONE = 11; > >> /** Resource has moved permanently. New url should be found in args. */ > >> > > I'm not sure what this means. When I load the page in my browser it shows > > status codes 200 or 304 for all resources. > > > > The problem only exists on a single site - other sites crawl fine. > > > > I saved a page from the site locally and that page fetches successfully. > > > > Can you please steer my in the right direction. Many Thanks, > > ...bob > > > >

