Hi,
By default, as I mentioned, Nutch does obey robots.txt. There is
a whitelist property that can be set in nutch-default to selectively
disable it for certain sites (again for valid security research use
cases).
Cheers,
Chris
++
C
I don't recall messing with anything to do with robots.txt, I want us to
be as polite as possible.
On May 25, 2016 12:22 AM, "Mattmann, Chris A (3980)" <
chris.a.mattm...@jpl.nasa.gov> wrote:
> Hi,
>
> For security research, there is an option to white-list robots.txt.
> It’s not enabled by defau
Hi,
For security research, there is an option to white-list robots.txt.
It’s not enabled by default and must be directly enabled.
The solution is - there isn’t one. People used to just hack
Nutch and do the same thing by commenting out a line of code
which accomplished the same check.
Those peo
Hi,
I've just seen on a website which tracks bots, that "Tarantula" , our
nutch 1.11 based crawler is being classified as not obeying robots.txt.
What's the solution?
Thanks Markus ! Much appreciated.
On Tue, May 24, 2016 at 5:48 AM, Markus Jelsma
wrote:
> Hello, Nutch does not have any support for this kind of thing. But it
> should be possible to test on screen wideness and such basic things with
> the new parse-htmlunit plugin. Link density looks less ob
Hello - i don't think so. But in case you are using Solr, you could use
solrmapping.xml on Nutch' side or of course a simple copyField in Solr's schema.
Markus
-Original message-
> From:Jigal van Hemert | alterNET internet BV
> Sent: Friday 20th May 2016 9:34
> To: user
> Subject: hea
Welcome too Karanjeet. Thanks for the good work on HtmlUnit plugin.
Cheers,
Markus
-Original message-
> From:Karanjeet Singh
> Sent: Monday 23rd May 2016 19:52
> To: d...@nutch.apache.org; user@nutch.apache.org
> Subject: Re: [ANNOUNCE] New Nutch committer and PMC - Karanjeet Singh
>
Welcome Thamme Gowda!
Cheers,
Markus
-Original message-
> From:Thamme Gowda
> Sent: Monday 23rd May 2016 0:56
> To: d...@nutch.apache.org; user@nutch.apache.org
> Subject: Re: [ANNOUNCE] New Nutch committer and PMC - Thamme Gowda N.
>
> Hi Sebastian,
> thanks for the invitation an
Hello, Nutch does not have any support for this kind of thing. But it should be
possible to test on screen wideness and such basic things with the new
parse-htmlunit plugin. Link density looks less obvious but font size and
presence of Flash is easier.
Markus
-Original message-
> F
9 matches
Mail list logo