t; HTTP POST Authentication
>
>
> Key: NUTCH-827
> URL: https://issues.apache.org/jira/browse/NUTCH-827
> Project: Nutch
> Issue Type: New Feature
> Components: protocol
>A
new Jira for this problem? Thanks!
> HTTP POST Authentication
>
>
> Key: NUTCH-827
> URL: https://issues.apache.org/jira/browse/NUTCH-827
> Project: Nutch
> Issue Type: New Feature
>
d in HTTP.java
> HTTP POST Authentication
>
>
> Key: NUTCH-827
> URL: https://issues.apache.org/jira/browse/NUTCH-827
> Project: Nutch
> Issue Type: New Feature
> Components: protoco
Default(new CookieManager());
// And the cookies policy could be changed here...
String pageContent = httpGetPageContent(authConfigurer.getLoginUrl());
List params = getLoginFormParams(pageContent);
sendPost(authConfigurer.getLoginUrl(), par
[
https://issues.apache.org/jira/browse/NUTCH-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1940:
Assignee: Talat UYARER
> Port HTTP POST Authentication to
Nice work Talat.
> Port HTTP POST Authentication to 2.X
>
>
> Key: NUTCH-1940
> URL: https://issues.apache.org/jira/browse/NUTCH-1940
> Project: Nutch
> Issue Type: New Feature
>
server uses
compressed http connection. The original patch can not read content. I add a
method that name is getResponseBody.
> Port HTTP POST Authentication to 2.X
>
>
> Key: NUTCH-1940
> URL: https://issues.apache.
I don't think this is what I was running into. I cannot replicate this
error using Nutch trunk. The only thing that stood out to me about the xml
config file was the lack of "" on the first line. But,
I'm not sure that would actually make a difference.
You can see https://github.com/tpalsulich/nut
On Tue, Apr 7, 2015 at 8:11 PM, Tizy Ninan wrote:
> NutchCrawler-1.0-SNAPSHOT.jar!
Maybe your configuration format is correct and before you missing the tag
of auth-configuration. But I find you still use 1.0-SNAPSHOT and you can
try the latest trunk version for Nutch at https://github.com/apac
, Los Angeles, CA 90089 USA
++
-Original Message-
From: Tizy Ninan
Reply-To: "dev@nutch.apache.org"
Date: Tuesday, April 7, 2015 at 5:11 AM
To: "u...@nutch.apache.org"
Cc: "dev@nutch.apache.org
Hi,
I am still not able to crawl websites requiring authentication.
The version of Nutch used is 1.10.
While crawling I am getting the following warnings and still not able to
identify what is going wrong.
Please find the httpclient-auth.xml file in the following link.
https://gist.github.com/ti
[
https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-827:
---
Labels: authentication memex (was: authentication)
> HTTP POST Authenticat
Hi Folks,
I'm having trouble getting HTTP POST authentication to work on mrs.org. I
have a valid username and password, but when I try to run parsechecker a
page that requires authentication (http://mrs.org/myMRS/), I get a 500
error and the following response:
Validation of viewstate MAC f
Tizy, in order to help debug your error, you'll need to provide additional
information. Check out this link for what's generally needed when trying to
debug over chat/email: http://www.mikeash.com/getting_answers
The error seems to say that httpclient.Http doesn't like the auth conf file
you provi
Edit: The first link should be https://www.mikeash.com/getting_answers.html
Thank you,
Mo
On Wed, Mar 18, 2015 at 8:16 PM, Mohammed Omer
wrote:
> Tizy, in order to help debug your error, you'll need to provide additional
> information. Check out this link for what's generally needed when tryin
Hi Lewis,
Thank you for the reply.
I tried by providing the parameters specified in the httpclient-auth.xml
template file. But while crawling I am getting the following warnings.
WARN httpclient.Http: Bad auth conf file: root element found
in httpclient-auth.xml - must be
WARN httpclient.Http:
Hi Tizy,
this should help:
https://wiki.apache.org/nutch/HttpPostAuthentication
http://svn.apache.org/repos/asf/nutch/trunk/conf/httpclient-auth.xml.template
For more details you could also check
https://issues.apache.org/jira/browse/NUTCH-827
https://issues.apache.org/jira/browse/NUTCH-1943
Che
Hi,
Is there any detailed step by step explanation on how to implement
HTTPPostAuthentication on Nutch 1.10.?
Thanks and Regards,
Tizy
t. Thanks!
> HTTP POST Authentication
>
>
> Key: NUTCH-827
> URL: https://issues.apache.org/jira/browse/NUTCH-827
> Project: Nutch
> Issue Type: New Feature
> Components: protocol
>
case there are local changes).
> HTTP POST Authentication
>
>
> Key: NUTCH-827
> URL: https://issues.apache.org/jira/browse/NUTCH-827
> Project: Nutch
> Issue Type: New Feature
>
sn't included in the commit.
> HTTP POST Authentication
>
>
> Key: NUTCH-827
> URL: https://issues.apache.org/jira/browse/NUTCH-827
> Project: Nutch
> Issue Type: New Feature
>
ttps://builds.apache.org/job/Nutch-trunk/2976/])
NUTCH-827 HTTP POST Authentication (lewismc:
http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1659701)
*
/nutch/trunk/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/HttpFormAuthConfigurer.java
*
/nutch/trunk/sr
tted @revision 1659701
> HTTP POST Authentication
>
>
> Key: NUTCH-827
> URL: https://issues.apache.org/jira/browse/NUTCH-827
> Project: Nutch
> Issue Type: New Feature
> Components: protoco
involved. All credited in CHANGES
> HTTP POST Authentication
>
>
> Key: NUTCH-827
> URL: https://issues.apache.org/jira/browse/NUTCH-827
> Project: Nutch
> Issue Type: New Feature
> Component
will commit this patch and log an issue to accommodate and address your final
suggestion (and an excellent one it is too!).
Thanks Seb.
> HTTP POST Authentication
>
>
> Key: NUTCH-827
> URL: https://issues.apache.org/ji
obal and ignores {{}}. So you have to
restrict your crawl to the form authentication pages only. Ideally, also form
authentication should be bound to a scope (one host, one URL prefix, etc.) same
as HTTP authentication.
> HTTP POST Authentication
>
>
get in to 1.10
> HTTP POST Authentication
>
>
> Key: NUTCH-827
> URL: https://issues.apache.org/jira/browse/NUTCH-827
> Project: Nutch
> Issue Type: New Feature
> Components: protoco
Lewis John McGibbney created NUTCH-1940:
---
Summary: Port HTTP POST Authentication to 2.X
Key: NUTCH-1940
URL: https://issues.apache.org/jira/browse/NUTCH-1940
Project: Nutch
Issue Type
[
https://issues.apache.org/jira/browse/NUTCH-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1940:
Issue Type: New Feature (was: Bug)
> Port HTTP POST Authentication to
[
https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-827:
---
Fix Version/s: (was: 2.4)
> HTTP POST Authenticat
LOGGER.debug("'name' attribute for form element is also null.");
throw new IllegalArgumentException("No form exists: "
+ authConfigurer.getLoginFormId());
}
}
{code}
The rest seem to be OK to me and I am able to use this patch to f
[
https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on NUTCH-827 stopped by Lewis John McGibbney.
--
> HTTP POST Authenticat
.bq log level TRACE should provide sufficient information what goes wrong when
logging in
+1
bq. config file to be committed should be conf/httpclient-auth.xml.template
instead of conf/httpclient-auth.xml
+1, patch coming up
Thanks for review
> HTTP POST Authentication
> ---
de/docs/Web/HTML/Element/form#Attributes]]).
I'll continue this trial to provide a fix/work-around.
- log level TRACE should provide sufficient information what goes wrong when
logging in
- config file to be committed should be {{conf/httpclient-auth.xml.template}}
instead of {{conf/httpclient-auth
[
https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-827:
---
Fix Version/s: (was: 1.11)
1.10
> HTTP POST Authenticat
[
https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on NUTCH-827 started by Lewis John McGibbney.
--
> HTTP POST Authenticat
to enable access to various large Databases
requiring HTTP Post authentication. I also would like to mention that setting
the redirect boolean flag to true is usually always required.
Would really appreciate if folks could try this out and comment.
> HTTP POST Authenticat
as I require form-based authentication for a current
research task.
> HTTP POST Authentication
>
>
> Key: NUTCH-827
> URL: https://issues.apache.org/jira/browse/NUTCH-827
> Project: Nutch
>
[
https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-827:
---
Fix Version/s: 2.4
> HTTP POST Authenticat
[
https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney reassigned NUTCH-827:
--
Assignee: Lewis John McGibbney
> HTTP POST Authenticat
[
https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-827:
Component/s: (was: fetcher)
protocol
> HTTP POST Authenticat
l: not protocol-http.
If you are interested, you may
read:http://lifelongprogrammer.blogspot.com/2014/02/part1-using-apache-http-client-to-do-http-post-form-authentication.html
> HTTP POST Authentication
>
>
> Key: NUTCH-827
>
[
https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yuanyun.cn updated NUTCH-827:
-
Attachment: http-client-form-authtication.patch
> HTTP POST Authenticat
[
https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-827:
--
Fix Version/s: 1.8
> HTTP POST Authenticat
[
https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-827:
---
Fix Version/s: 2.2
> HTTP POST Authenticat
Thanks for the help!
> HTTP POST Authentication
>
>
> Key: NUTCH-827
> URL: https://issues.apache.org/jira/browse/NUTCH-827
> Project: Nutch
> Issue Type: New Feature
> Components: fe
"url: " + url +
+"; status code: " + code +
+"; cookies received: " +
Http.getClient().getState().getCookies().length);
{code}
If you turn on TRACE logging, you should see messages like that.
Nutch and stored as
intended?
Thanks,
Max
> HTTP POST Authentication
>
>
> Key: NUTCH-827
> URL: https://issues.apache.org/jira/browse/NUTCH-827
> Project: Nutch
> Issue Type: Ne
hink that we ended up solving that simply by
removing ..
{code}
+method.setFollowRedirects(followRedirects);
{code}
As redirects are not supported for POST-requests.
> HTTP POST Authentication
>
>
> Key: NUTCH-827
>
this...
Thanks for your time!
Max
> HTTP POST Authentication
>
>
> Key: NUTCH-827
> URL: https://issues.apache.org/jira/browse/NUTCH-827
> Project: Nutch
> Issue Type: New Feature
>
d authentication failed; cookies will not be
present for this request but an attempt to retrieve them will be made for the
next one.", e);
To see where the Exception is coming from. All it does after that LOG.error()
is release the connection. So it shouldn't be throwing an Excepti
t for this request but an attempt to retrieve them will be made for the
next one.", e);
To see where the Exception is coming from. All it does after that LOG.error()
is release the connection. So it shouldn't be throwing an Exception.
> HTTP POST Authentication
>
be made for the
next one.", e);
To see where the Exception is coming from. All it does after that LOG.error()
is release the connection. So it shouldn't be throwing an Exception.
> HTTP POST Authentication
>
>
> Key:
made for the next one.
2012-10-01 13:11:24,682 ERROR httpclient.Http - Unable to retrieve login page;
code = 200
The second line with response code 200 is what I don't understand. I'd
appreciate any tips you could give in this regard.
Thanks,
Max
> HT
ave to go that way then.
Best regards,
Max
> HTTP POST Authentication
>
>
> Key: NUTCH-827
> URL: https://issues.apache.org/jira/browse/NUTCH-827
> Project: Nutch
> Issue Type: New
ookie,
and then returns some specific piece of data only when that cookie is set?
Good luck!
Jasper
> HTTP POST Authentication
>
>
> Key: NUTCH-827
> URL: https://issues.apache.org/jira/browse/NUTCH-827
>
n! I applied the patch and compiled Nutch just
fine, but can't confirm that it is working. Can you point to a website that
this patch worked to pass the form auth at? I need to verify that it is working
for me, but can't at the moment.
Thanks in advance,
Max
> HTTP PO
jar which is used for each subsequent request.
This isn't exactly a fool-proof solution (what if other requests generate
expired cookies? what if the login fails? etc.), but for the project for which
I wrote the patch, it suited our needs. Hope it helps!
> HTTP POST
se answer Ian's question? I have a
similar problem figuring out how exactly I can provide the username and
password.
Thank you!
> HTTP POST Authentication
>
>
> Key: NUTCH-827
> URL: https://issues.apa
[
https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-827:
Fix Version/s: (was: 1.5)
1.6
20120304-push-1.6
> HTTP P
[
https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-827:
Fix Version/s: 1.5
> HTTP POST Authenticat
and password need to go? I
presently have these in runtime/local/conf/httpclient-auth.xml, but this
doesn't seem to work. Also, what url needs to go in the nutch-site.xml file?
> HTTP POST Authentication
>
>
>
[
https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jasper van Veghel updated NUTCH-827:
Attachment: nutch-http-cookies.patch
> HTTP POST Authenticat
HTTP POST Authentication
Key: NUTCH-827
URL: https://issues.apache.org/jira/browse/NUTCH-827
Project: Nutch
Issue Type: New Feature
Components: fetcher
Affects Versions: 1.1, 2.0
Reporter
64 matches
Mail list logo