Hello Tizy,

As I know, currently the development version of Nutch can do Basic, Digest
and NTLM based authentication. [1] Nutch can not do POST based
authentication that depends on cookies. BTW there is a document which
supposed to provide this feature but as far as i see no code developed yet.
[2]

[1] https://wiki.apache.org/nutch/HttpAuthenticationSchemes
[2] https://wiki.apache.org/nutch/HttpPostAuthentication

Halil

2014-12-16 7:16 GMT+02:00 Tizy Ninan <[email protected]>:
>
> Hi,
>
> I am trying to develop a custom crawler to crawl websites that require form
> based authentication using Nutch v1.9 in Java.  The HttpPostAuthentication
> feature of Nutch is followed to implement it.
>
> The login parameters required for authentication such as html form-id,
> login post data(username, password) are specified as key-value pairs in a
> configuration file. What is required to identify the html login form(id or
> name of the html form)? How to identify the html form parameters if id or
> name of the form is not specified?
>
> I have also posted the question to the developer mailing list, but did not
> receive any reply.I am stuck with this for a while. Could somebody provide
> with a solution on how to specify the html form parameters of websites to
> be crawled to perform form based authentication?
>
> Thanks and Regards,
> Tizy
>

Reply via email to