Re: Robots.txt

2016-05-24 Thread Mattmann, Chris A (3980)
Hi, By default, as I mentioned, Nutch does obey robots.txt. There is a whitelist property that can be set in nutch-default to selectively disable it for certain sites (again for valid security research use cases). Cheers, Chris ++ C

Re: Robots.txt

2016-05-24 Thread BlackIce
I don't recall messing with anything to do with robots.txt, I want us to be as polite as possible. On May 25, 2016 12:22 AM, "Mattmann, Chris A (3980)" < chris.a.mattm...@jpl.nasa.gov> wrote: > Hi, > > For security research, there is an option to white-list robots.txt. > It’s not enabled by defau

Re: Robots.txt

2016-05-24 Thread Mattmann, Chris A (3980)
Hi, For security research, there is an option to white-list robots.txt. It’s not enabled by default and must be directly enabled. The solution is - there isn’t one. People used to just hack Nutch and do the same thing by commenting out a line of code which accomplished the same check. Those peo

Robots.txt

2016-05-24 Thread BlackIce
Hi, I've just seen on a website which tracks bots, that "Tarantula" , our nutch 1.11 based crawler is being classified as not obeying robots.txt. What's the solution?

Re: Scoring mobile-friendliness

2016-05-24 Thread Fengtan
Thanks Markus ! Much appreciated. On Tue, May 24, 2016 at 5:48 AM, Markus Jelsma wrote: > Hello, Nutch does not have any support for this kind of thing. But it > should be possible to test on screen wideness and such basic things with > the new parse-htmlunit plugin. Link density looks less ob

RE: headings plug-in target field

2016-05-24 Thread Markus Jelsma
Hello - i don't think so. But in case you are using Solr, you could use solrmapping.xml on Nutch' side or of course a simple copyField in Solr's schema. Markus -Original message- > From:Jigal van Hemert | alterNET internet BV > Sent: Friday 20th May 2016 9:34 > To: user > Subject: hea

RE: [ANNOUNCE] New Nutch committer and PMC - Karanjeet Singh

2016-05-24 Thread Markus Jelsma
Welcome too Karanjeet. Thanks for the good work on HtmlUnit plugin. Cheers, Markus -Original message- > From:Karanjeet Singh > Sent: Monday 23rd May 2016 19:52 > To: d...@nutch.apache.org; user@nutch.apache.org > Subject: Re: [ANNOUNCE] New Nutch committer and PMC - Karanjeet Singh >

RE: [ANNOUNCE] New Nutch committer and PMC - Thamme Gowda N.

2016-05-24 Thread Markus Jelsma
Welcome Thamme Gowda! Cheers, Markus -Original message- > From:Thamme Gowda > Sent: Monday 23rd May 2016 0:56 > To: d...@nutch.apache.org; user@nutch.apache.org > Subject: Re: [ANNOUNCE] New Nutch committer and PMC - Thamme Gowda N. > > Hi Sebastian, >  thanks for the invitation an

RE: Scoring mobile-friendliness

2016-05-24 Thread Markus Jelsma
Hello, Nutch does not have any support for this kind of thing. But it should be possible to test on screen wideness and such basic things with the new parse-htmlunit plugin. Link density looks less obvious but font size and presence of Flash is easier. Markus -Original message- > F