Hi Michael, the only differences in the protocol-httpclient plugin between Nutch 1.11 and 1.13 are - NUTCH-2280 [1] which allows to configure the cookie policy - NUTCH-2355 [2] which allows to set an explicit cookie for a request URL
Could this be related? Are there any useful hints what could be the reason in the log messages if you set log4j.logger.org.apache.nutch.protocol.httpclient=TRACE in the log4j.properties ? Best, Sebastian [1] https://issues.apache.org/jira/browse/NUTCH-2280 [2] https://issues.apache.org/jira/browse/NUTCH-2355 On 5/9/22 15:30, Fritsch, Michael wrote: > Hello, > > I used nutch 1.11 to crawl pages behind a login page. > The http-auth configuration looked like this: > > --------------------------------------------------------------------------- > <?xml version="1.0"?> > <auth-configuration> > <credentials authMethod="formAuth" > > loginUrl=loginURL<https://sso.coremedia.com/zendesk/index.jsp?brand_id=3187316&locale_id=1&return_to=https%3A%2F%2Fsupport.coremedia.com%2Fhc%2Fen-us&timestamp=1464019963> > loginFormId="loginForm" > loginRedirect="true"> > <loginPostData> > <field name="user[email]" > value="username"/> > <field name="user[password]" > value="password"/> > </loginPostData> > <additionalPostHeaders> > </additionalPostHeaders> > </credentials> > </auth-configuration> > -------------------------------------------------------------------- > > Everything worked fine. Then I updated to 1.13 (I also tried 1.18) and > changed the configuration as described in the http-auth.xml file: > > ----------------------------------------------------------------------------- > > <auth-configuration> > <credentials authMethod="formAuth" > > loginUrl=loginURL<https://sso.coremedia.com/zendesk/index.jsp?brand_id=3187316&locale_id=1&return_to=https%3A%2F%2Fsupport.coremedia.com%2Fhc%2Fen-us&timestamp=1464019963> > loginFormId="loginForm" > loginRedirect="true"> > <loginPostData> > <field name="user[email]" > value="username"/> > <field name="user[password]" > value="password"/> > </loginPostData> > <additionalPostHeaders> > </additionalPostHeaders> > <removedFormFields> > </removedFormFields> > <loginCookie> > <policy>BROWSER_COMPATIBILITY</policy> > </loginCookie> > </credentials> > > </auth-configuration> > > ----------------------------------------------- > > Now, the login did not work anymore. After some redirects, it gives an HTML > response 403. I tried all loginCookie policy entries, but nothing worked. > The login is to a Zendesk support system with Atlassian Crowd as a login > provider. Has anything changed between 1.11 and 1.13 is something more strict > than before? > > > I found a very similar question in this mailing list > (https://www.mail-archive.com/user@nutch.apache.org/msg15746.htmlfrom ) from > 2017, which has no solutions. > > I would appreciate any help! > > Best regards > > Michael > > > Dr. Michael Fritsch > Technical Editor > > T: +49.40.325587.214 > E: michael.frit...@coremedia.com<mailto:michael.frit...@coremedia.com> > > CoreMedia GmbH - Be iconic > Ludwig-Erhard-Str. 18 > 20459 Hamburg, Germany > www.coremedia.com<http://www.coremedia.com/> > ------------------------------------------------------------ > Managing Directory: Sören Stamer > Commercial Register: Amtsgericht Hamburg, HR B 162480 > ---------------------------------------------------------------------- > Stay up to date and follow us on > LinkedIn<https://www.linkedin.com/company/coremedia-corp> or > Twitter<https://twitter.com/contentcloud> > >