Hello all! I have a problem with Nutch 1.13 failing to login to our intranet using form authentication.
This is my httpclient-auth.xml for Nutch 1.13: -------------------------------------------------------------------- <?xml version="1.0"?> <auth-configuration> <credentials authMethod="formAuth" loginUrl="https://flamma.helsinki.fi/portal/home/login?_nfpb=true&_pageLabel=P1460018791401275675293" loginFormId="flamma-login-frm" loginRedirect="false"> <loginPostData> <field name="username" value="REDACTED"/> <field name="password" value="REDACTED"/> </loginPostData> <loginCookie>BROWSER_COMPATIBILITY</loginCookie> </credentials> </auth-configuration> -------------------------------------------------------------------- Which has been adapted from this one in our production environment running some kind of mix between Nutch 1.9 and 1.10: -------------------------------------------------------------------- <?xml version="1.0"?> <auth-configuration> <credentials authMethod="formAuth" loginUrl="https://flamma.helsinki.fi/portal/home/login?_nfpb=true&_pageLabel=P1460018791401275675293" loginFormName="flamma-login-frm" loginRedirect="false"> <loginPostData> <field name="username" value="REDACTED"/> <field name="password" value="REDACTED"/> </loginPostData> </credentials> </auth-configuration> -------------------------------------------------------------------- Our production Nutch 1.9/1.10 with the above httpclient-auth.xml DOES WORK and successfully authenticates. Our development environment with Nutch 1.13, with the first config listed, fails to authenticate. I turned on debug logging in the Nutch 1.13 dev environment for the httpclient in log4j.properties, and this is what I find in hadoop.log: -------------------------------------------------------------------- 2017-09-15 14:39:37,632 DEBUG httpclient.HttpFormAuthentication (HttpFormAuthentication.java:sendPost(120)) - FormAuth: set cookie policy 2017-09-15 14:39:37,794 DEBUG httpclient.HttpFormAuthentication (HttpFormAuthentication.java:sendPost(126)) - rspCode: 200 2017-09-15 14:39:37,794 DEBUG httpclient.HttpFormAuthentication (HttpFormAuthentication.java:sendPost(127)) - Sending 'POST' request to URL : https://flamma.helsinki.fi/portal/home/login?_nfpb=true&_pageLabel=P1460018791401275675293 2017-09-15 14:39:37,794 DEBUG httpclient.HttpFormAuthentication (HttpFormAuthentication.java:sendPost(129)) - Post parameters : [name=tz, value=, name=, value=Kirjaudu, name=password, value=REDACTED, name=username, value=REDACTED] 2017-09-15 14:39:37,794 DEBUG httpclient.HttpFormAuthentication (HttpFormAuthentication.java:sendPost(130)) - Response Code : 200 2017-09-15 14:39:37,794 DEBUG httpclient.HttpFormAuthentication (HttpFormAuthentication.java:sendPost(132)) - Response headers : User-Agent: HY_crawler/Nutch-1.13 (Crawler for University of Hel sinki) 2017-09-15 14:39:37,794 DEBUG httpclient.HttpFormAuthentication (HttpFormAuthentication.java:sendPost(132)) - Response headers : Connection: keep-alive 2017-09-15 14:39:37,794 DEBUG httpclient.HttpFormAuthentication (HttpFormAuthentication.java:sendPost(132)) - Response headers : Accept-Language: en-us,en-gb,en;q=0.7,*;q=0.3 2017-09-15 14:39:37,794 DEBUG httpclient.HttpFormAuthentication (HttpFormAuthentication.java:sendPost(132)) - Response headers : Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/ *;q=0.8 2017-09-15 14:39:37,794 DEBUG httpclient.HttpFormAuthentication (HttpFormAuthentication.java:sendPost(132)) - Response headers : Content-Type: application/x-www-form-urlencoded 2017-09-15 14:39:37,794 DEBUG httpclient.HttpFormAuthentication (HttpFormAuthentication.java:sendPost(132)) - Response headers : Cookie: 2017-09-15 14:39:37,794 DEBUG httpclient.HttpFormAuthentication (HttpFormAuthentication.java:sendPost(132)) - Response headers : Accept-Language: en-us,en-gb,en;q=0.7,*;q=0.3 2017-09-15 14:39:37,795 DEBUG httpclient.HttpFormAuthentication (HttpFormAuthentication.java:sendPost(132)) - Response headers : Accept-Charset: utf-8,ISO-8859-1;q=0.7,*;q=0.7 2017-09-15 14:39:37,795 DEBUG httpclient.HttpFormAuthentication (HttpFormAuthentication.java:sendPost(132)) - Response headers : Accept: text/html,application/xml;q=0.9,application/xhtml+xml,te xt/xml;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 2017-09-15 14:39:37,795 DEBUG httpclient.HttpFormAuthentication (HttpFormAuthentication.java:sendPost(132)) - Response headers : Accept-Encoding: x-gzip, gzip, deflate 2017-09-15 14:39:37,795 DEBUG httpclient.HttpFormAuthentication (HttpFormAuthentication.java:sendPost(132)) - Response headers : Host: flamma.helsinki.fi 2017-09-15 14:39:37,795 DEBUG httpclient.HttpFormAuthentication (HttpFormAuthentication.java:sendPost(132)) - Response headers : Cookie: $Version=0; JSESSIONID=vPaFVexlwJ4m1Iq94kSw_jSTkmPnkysAT 9jA6UHBy3CfVShafxjG!-573369189!NONE; $Path=/ 2017-09-15 14:39:37,795 DEBUG httpclient.HttpFormAuthentication (HttpFormAuthentication.java:sendPost(132)) - Response headers : Content-Length: 63 2017-09-15 14:39:37,796 DEBUG httpclient.HttpFormAuthentication (HttpFormAuthentication.java:sendPost(136)) - login post result: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" " http://www.w3.org/TR/html4/loose.dtd"> <html lang="fi"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><link rel="alternate" hreflang="fi-fi" href="/portal/home/login?hl=fi"/><link rel="alternate" hreflang=" sv-fi" href="/portal/home/login?hl=sv"/><link rel="alternate" hreflang="en-fi" href="/portal/home/login?hl=en"/><link rel="alternate" hreflang="x-default" href="https://flamma.helsinki.fi"/><me ta name="AUTHOR" content="Helsingin yliopisto"/> -------------------------------SNIP--------------------------------- <div class="wrapper"> <form method="post" name="flamma-login-frm" class="loginform" action="/portal/home/login?_nfpb=true&_windowLabel=T2180228791401275715061&_pageLabel=P1460018791401275675293"> <div> <label for="T2180228791401275715061username">Käyttäjätunnus:</label> <input class="rounded" id="T2180228791401275715061username" type="text" size=15 name="username" > </div> <div> <label for="T2180228791401275715061password">Salasana:</label> <input class="rounded" id="T2180228791401275715061password" type="password" size=15 name="password" autocomplete="off"> </div> <div> <input type="hidden" name="tz" value="" id="T2180228791401275715061tz" /> <input type="submit" value="Kirjaudu"> </div> </form> </div> -------------------------------SNIP--------------------------------- Nutch seems to attempt to authenticate through this form before each fetch request, because this exchange is repeated many many times in hadoop.log. And each time, the server responds with the same login form again, instead of redirecting to the main page of the intranet portal. There is not even an error message complaining about incorrect username or password, as we would expect on such failure. Any advice on why form authentication through the same form on the same intranet fails on Nutch 1.13 with very similar configuration file to the older one in production? Thanks. -- Ronja Koistinen University of Helsinki
signature.asc
Description: OpenPGP digital signature