Hi Tizy, There is some discuss. You can reach at NUTCH-827 [1] IMHO we need some help. If we create this feature it will be useful.
Talat [1] https://issues.apache.org/jira/browse/NUTCH-827 2014-12-16 10:44 GMT+02:00 Tizy Ninan <[email protected]>: > Hi, > > Thanks for the reply. > Is there any alternative way to do this authentication? Does the fetcher > job of Nutch accept cookies for fetching the web sites from the same > domain? Could you suggest any work around to do form based authentication > using Nutch? > > Thanks, > Tizy > > On Tue, Dec 16, 2014 at 1:08 PM, Halil Ibrahim Simsek <[email protected]> > wrote: >> >> Hello Tizy, >> >> As I know, currently the development version of Nutch can do Basic, Digest >> and NTLM based authentication. [1] Nutch can not do POST based >> authentication that depends on cookies. BTW there is a document which >> supposed to provide this feature but as far as i see no code developed yet. >> [2] >> >> [1] https://wiki.apache.org/nutch/HttpAuthenticationSchemes >> [2] https://wiki.apache.org/nutch/HttpPostAuthentication >> >> Halil >> >> 2014-12-16 7:16 GMT+02:00 Tizy Ninan <[email protected]>: >> > >> > Hi, >> > >> > I am trying to develop a custom crawler to crawl websites that require >> form >> > based authentication using Nutch v1.9 in Java. The >> HttpPostAuthentication >> > feature of Nutch is followed to implement it. >> > >> > The login parameters required for authentication such as html form-id, >> > login post data(username, password) are specified as key-value pairs in a >> > configuration file. What is required to identify the html login form(id >> or >> > name of the html form)? How to identify the html form parameters if id or >> > name of the form is not specified? >> > >> > I have also posted the question to the developer mailing list, but did >> not >> > receive any reply.I am stuck with this for a while. Could somebody >> provide >> > with a solution on how to specify the html form parameters of websites to >> > be crawled to perform form based authentication? >> > >> > Thanks and Regards, >> > Tizy >> > >> > > > -- > Thanks and Regards, > Tizy -- Talat UYARER Websitesi: http://talat.uyarer.com Twitter: http://twitter.com/talatuyarer Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

