Hi Lewis,
Thank you for the reply.
I tried by providing the parameters specified in the httpclient-auth.xml
template file. But while crawling I am getting the following warnings.
WARN httpclient.Http: Bad auth conf file: root element found
in httpclient-auth.xml - must be
WARN httpclient.Http:
Thank you very much Nirav, it helped.
On Wed, Mar 11, 2015 at 7:20 PM, Nirav Thaker wrote:
> You will need to put '_maxdepth_' metadata in seed file like following:
>
> http://domain1.com/abc _maxdepth_=2 some.other.metadata=xys
>
>
> http://domain2.com/xyz _maxdepth_=99 some.other.metadata=abc
Hi Arthur,
On Thu, Mar 12, 2015 at 12:20 AM, wrote:
>
> I downloaded http://svn.apache.org/repos/asf/nutch/branches/2.x/
> re-run the compilation, still got the the error
>
> Question: Are the following dependencies are correctly set in my ivy.xml?
>
> conf="*->default" />
> rev="2.2.3
Hi Tizy,
On Thu, Mar 12, 2015 at 12:20 AM, wrote:
>
> Is there any detailed step by step explanation on how to implement
> HTTPPostAuthentication on Nutch 1.10.?
>
>
https://github.com/apache/nutch/blob/trunk/conf/httpclient-auth.xml.template#L61-L105
https://wiki.apache.org/nutch/HttpPostAuthen
Hello Jorge,
This is an interesting but very complicated issue. First of all, do not rely on
HTTP headers, they are incorrect on any scale larger than very small. This is
true for Last-Modified due to dynamic CMS' but for many other headers. You can
even expect website descriptions in headers s
You will need to put '_maxdepth_' metadata in seed file like following:
http://domain1.com/abc _maxdepth_=2 some.other.metadata=xys
http://domain2.com/xyz _maxdepth_=99 some.other.metadata=abc
HTH
On 03/11/2015 01:10 PM, Svyatoslav Lavryk wrote:
Hello,
We use Nutch 1.9 with Hadoop 1.2.1 for
Hello Jigal - every distribution of Nutch configuration should in my opinion
disable OPIC-scoring. In fact, i think we should remove it from
nutch-default.xml altogether.
Markus
-Original message-
> From:Jigal van Hemert | alterNET internet BV
> Sent: Wednesday 11th March 2015 9:40
> To
Hi Jonathan,
Apologies for my delayed response. Thank you for the
pointer the crawl worked as expected, I needed to tweak regex filtering.
Thank you once again,
Sidharth
On Wed, Mar 11, 2015 at 4:46 AM, Jonathan Cooper-Ellis <
jcooperel...@cloudera.com> wrote:
> Hi Siddharth,
Hi,
Is there any detailed step by step explanation on how to implement
HTTPPostAuthentication on Nutch 1.10.?
Thanks and Regards,
Tizy
9 matches
Mail list logo