Hi Tizy,

There is some discuss. You can reach at NUTCH-827 [1] IMHO we need
some help. If we create this feature it will be useful.

Talat

[1] https://issues.apache.org/jira/browse/NUTCH-827

2014-12-16 10:44 GMT+02:00 Tizy Ninan <[email protected]>:
> Hi,
>
> Thanks for the reply.
> Is there any alternative way to do this authentication? Does the fetcher
> job of Nutch accept cookies for fetching the web sites from the same
> domain? Could you suggest any work around to do form based authentication
> using Nutch?
>
> Thanks,
> Tizy
>
> On Tue, Dec 16, 2014 at 1:08 PM, Halil Ibrahim Simsek <[email protected]>
> wrote:
>>
>> Hello Tizy,
>>
>> As I know, currently the development version of Nutch can do Basic, Digest
>> and NTLM based authentication. [1] Nutch can not do POST based
>> authentication that depends on cookies. BTW there is a document which
>> supposed to provide this feature but as far as i see no code developed yet.
>> [2]
>>
>> [1] https://wiki.apache.org/nutch/HttpAuthenticationSchemes
>> [2] https://wiki.apache.org/nutch/HttpPostAuthentication
>>
>> Halil
>>
>> 2014-12-16 7:16 GMT+02:00 Tizy Ninan <[email protected]>:
>> >
>> > Hi,
>> >
>> > I am trying to develop a custom crawler to crawl websites that require
>> form
>> > based authentication using Nutch v1.9 in Java.  The
>> HttpPostAuthentication
>> > feature of Nutch is followed to implement it.
>> >
>> > The login parameters required for authentication such as html form-id,
>> > login post data(username, password) are specified as key-value pairs in a
>> > configuration file. What is required to identify the html login form(id
>> or
>> > name of the html form)? How to identify the html form parameters if id or
>> > name of the form is not specified?
>> >
>> > I have also posted the question to the developer mailing list, but did
>> not
>> > receive any reply.I am stuck with this for a while. Could somebody
>> provide
>> > with a solution on how to specify the html form parameters of websites to
>> > be crawled to perform form based authentication?
>> >
>> > Thanks and Regards,
>> > Tizy
>> >
>>
>
>
> --
> Thanks and Regards,
> Tizy



-- 
Talat UYARER
Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

Reply via email to