Hello all!

I have a problem with Nutch 1.13 failing to login to our intranet using
form authentication.

This is my httpclient-auth.xml for Nutch 1.13:

--------------------------------------------------------------------
<?xml version="1.0"?>
<auth-configuration>
  <credentials authMethod="formAuth"

loginUrl="https://flamma.helsinki.fi/portal/home/login?_nfpb=true&amp;_pageLabel=P1460018791401275675293";
  loginFormId="flamma-login-frm"
  loginRedirect="false">
    <loginPostData>
      <field name="username" value="REDACTED"/>
      <field name="password" value="REDACTED"/>
    </loginPostData>
    <loginCookie>BROWSER_COMPATIBILITY</loginCookie>
  </credentials>
</auth-configuration>
--------------------------------------------------------------------

Which has been adapted from this one in our production environment
running some kind of mix between Nutch 1.9 and 1.10:

--------------------------------------------------------------------
<?xml version="1.0"?>
<auth-configuration>
  <credentials authMethod="formAuth"
loginUrl="https://flamma.helsinki.fi/portal/home/login?_nfpb=true&amp;_pageLabel=P1460018791401275675293";
loginFormName="flamma-login-frm" loginRedirect="false">
    <loginPostData>
      <field name="username" value="REDACTED"/>
      <field name="password" value="REDACTED"/>
    </loginPostData>
  </credentials>
</auth-configuration>
--------------------------------------------------------------------

Our production Nutch 1.9/1.10 with the above httpclient-auth.xml DOES
WORK and successfully authenticates.

Our development environment with Nutch 1.13, with the first config
listed, fails to authenticate.

I turned on debug logging in the Nutch 1.13 dev environment for the
httpclient in log4j.properties, and this is what I find in hadoop.log:

--------------------------------------------------------------------
2017-09-15 14:39:37,632 DEBUG httpclient.HttpFormAuthentication
(HttpFormAuthentication.java:sendPost(120)) - FormAuth: set cookie policy
2017-09-15 14:39:37,794 DEBUG httpclient.HttpFormAuthentication
(HttpFormAuthentication.java:sendPost(126)) - rspCode: 200
2017-09-15 14:39:37,794 DEBUG httpclient.HttpFormAuthentication
(HttpFormAuthentication.java:sendPost(127)) -
Sending 'POST' request to URL :
https://flamma.helsinki.fi/portal/home/login?_nfpb=true&_pageLabel=P1460018791401275675293
2017-09-15 14:39:37,794 DEBUG httpclient.HttpFormAuthentication
(HttpFormAuthentication.java:sendPost(129)) - Post parameters :
[name=tz, value=, name=, value=Kirjaudu, name=password, value=REDACTED,
name=username, value=REDACTED]
2017-09-15 14:39:37,794 DEBUG httpclient.HttpFormAuthentication
(HttpFormAuthentication.java:sendPost(130)) - Response Code : 200
2017-09-15 14:39:37,794 DEBUG httpclient.HttpFormAuthentication
(HttpFormAuthentication.java:sendPost(132)) - Response headers :
User-Agent: HY_crawler/Nutch-1.13 (Crawler for University of Hel
sinki)

2017-09-15 14:39:37,794 DEBUG httpclient.HttpFormAuthentication
(HttpFormAuthentication.java:sendPost(132)) - Response headers :
Connection: keep-alive

2017-09-15 14:39:37,794 DEBUG httpclient.HttpFormAuthentication
(HttpFormAuthentication.java:sendPost(132)) - Response headers :
Accept-Language: en-us,en-gb,en;q=0.7,*;q=0.3

2017-09-15 14:39:37,794 DEBUG httpclient.HttpFormAuthentication
(HttpFormAuthentication.java:sendPost(132)) - Response headers : Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,*/
*;q=0.8

2017-09-15 14:39:37,794 DEBUG httpclient.HttpFormAuthentication
(HttpFormAuthentication.java:sendPost(132)) - Response headers :
Content-Type: application/x-www-form-urlencoded

2017-09-15 14:39:37,794 DEBUG httpclient.HttpFormAuthentication
(HttpFormAuthentication.java:sendPost(132)) - Response headers : Cookie:

2017-09-15 14:39:37,794 DEBUG httpclient.HttpFormAuthentication
(HttpFormAuthentication.java:sendPost(132)) - Response headers :
Accept-Language: en-us,en-gb,en;q=0.7,*;q=0.3

2017-09-15 14:39:37,795 DEBUG httpclient.HttpFormAuthentication
(HttpFormAuthentication.java:sendPost(132)) - Response headers :
Accept-Charset: utf-8,ISO-8859-1;q=0.7,*;q=0.7

2017-09-15 14:39:37,795 DEBUG httpclient.HttpFormAuthentication
(HttpFormAuthentication.java:sendPost(132)) - Response headers : Accept:
text/html,application/xml;q=0.9,application/xhtml+xml,te
xt/xml;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5

2017-09-15 14:39:37,795 DEBUG httpclient.HttpFormAuthentication
(HttpFormAuthentication.java:sendPost(132)) - Response headers :
Accept-Encoding: x-gzip, gzip, deflate

2017-09-15 14:39:37,795 DEBUG httpclient.HttpFormAuthentication
(HttpFormAuthentication.java:sendPost(132)) - Response headers : Host:
flamma.helsinki.fi

2017-09-15 14:39:37,795 DEBUG httpclient.HttpFormAuthentication
(HttpFormAuthentication.java:sendPost(132)) - Response headers : Cookie:
$Version=0; JSESSIONID=vPaFVexlwJ4m1Iq94kSw_jSTkmPnkysAT
9jA6UHBy3CfVShafxjG!-573369189!NONE; $Path=/

2017-09-15 14:39:37,795 DEBUG httpclient.HttpFormAuthentication
(HttpFormAuthentication.java:sendPost(132)) - Response headers :
Content-Length: 63

2017-09-15 14:39:37,796 DEBUG httpclient.HttpFormAuthentication
(HttpFormAuthentication.java:sendPost(136)) - login post result:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "
http://www.w3.org/TR/html4/loose.dtd";>
<html lang="fi"><head><meta http-equiv="Content-Type"
content="text/html; charset=UTF-8"><link rel="alternate"
hreflang="fi-fi" href="/portal/home/login?hl=fi"/><link rel="alternate"
hreflang="
sv-fi" href="/portal/home/login?hl=sv"/><link rel="alternate"
hreflang="en-fi" href="/portal/home/login?hl=en"/><link rel="alternate"
hreflang="x-default" href="https://flamma.helsinki.fi"/><me
ta name="AUTHOR" content="Helsingin yliopisto"/>

-------------------------------SNIP---------------------------------

                        <div class="wrapper">
                                <form method="post"
name="flamma-login-frm" class="loginform"
action="/portal/home/login?_nfpb=true&amp;_windowLabel=T2180228791401275715061&amp;_pageLabel=P1460018791401275675293">
                                        <div>
                                                <label
for="T2180228791401275715061username">Käyttäjätunnus:</label>
                                                <input class="rounded"
id="T2180228791401275715061username" type="text" size=15 name="username" >
                                        </div>
                                        <div>
                                                <label
for="T2180228791401275715061password">Salasana:</label>
                                                <input class="rounded"
id="T2180228791401275715061password" type="password" size=15
name="password" autocomplete="off">
                                        </div>
                                        <div>
                                                <input type="hidden"
name="tz" value="" id="T2180228791401275715061tz" />
                                                <input type="submit"
value="Kirjaudu">
                                        </div>
                                </form>
                        </div>

-------------------------------SNIP---------------------------------


Nutch seems to attempt to authenticate through this form before each
fetch request, because this exchange is repeated many many times in
hadoop.log. And each time, the server responds with the same login form
again, instead of redirecting to the main page of the intranet portal.
There is not even an error message complaining about incorrect username
or password, as we would expect on such failure.

Any advice on why form authentication through the same form on the same
intranet fails on Nutch 1.13 with very similar configuration file to the
older one in production?

Thanks.

-- 
Ronja Koistinen
University of Helsinki

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to