Hi Lewis,
I believe that you can find the robots.txt of the site here:
http://www.kinoundco.de/robots.txt
I think he followed the instructions at http://lucene.apache.org/nutch/bot.html
(this outdated URL is still in the HttpBase.java btw) correctly.
My guess is that the guys at pixray.com have
[
https://issues.apache.org/jira/browse/NUTCH-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151828#comment-13151828
]
Hudson commented on NUTCH-1196:
---
Integrated in Nutch-nutchgora #70 (See
[https://builds.apa
Hi Maximilian,
What Iwere missing is the robots.txt itself. I.e how are you trying to ban
Nutch. I've been in touch with the guys at traffic server with your issue
to to see if they have suggestions without totally banning all Nutch
instances from contacting your webserver.
To all dev's, the othe
> Thanks for the FYI guys.
>
> I've got this on my open source radar, along with
> reviewing the Airavata release (incubating), and
> the MRUnit release (incubating) for this week.
>
> I'll git er' done. Also, since the release updates for rc #2
> were largely aesthetic (aka packaging and naming
Thanks for the FYI guys.
I've got this on my open source radar, along with
reviewing the Airavata release (incubating), and
the MRUnit release (incubating) for this week.
I'll git er' done. Also, since the release updates for rc #2
were largely aesthetic (aka packaging and naming
of the outp
> Chris,
>
> Any idea of when you'll be able to push a new RC for 1.4?
> Note : I think some stuff marked as 1.5 has been committed - we might need
> to check the CHANGES
Definately, i've committed several items. When i did my first trunk was
already prepared for 1.5.
Here's the list of change
Chris,
Any idea of when you'll be able to push a new RC for 1.4?
Note : I think some stuff marked as 1.5 has been committed - we might need
to check the CHANGES
Thanks
Julien
On 9 November 2011 10:21, Mattmann, Chris A (388J) <
chris.a.mattm...@jpl.nasa.gov> wrote:
> Hi Julien,
>
> Thanks. OK,
[
https://issues.apache.org/jira/browse/NUTCH-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney reopened NUTCH-1081:
-
Reopening this issue as per our concerns.
For the record, the Jenkins build area ha
[
https://issues.apache.org/jira/browse/NUTCH-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ferdy Galema closed NUTCH-1196.
---
Resolution: Fixed
Committed.
> Update job should impose an upper limit on the number
All requests seem to come from a German company called http://www.pixray.com,
which obviously ignores the robots.txt with their version of the Nutch crawler.
We informed them and will ban their IP-range, if they don't stop to scan us
with invalid requests.
Sincerely,
Maximilian Laurenz
S&L Medi
[
https://issues.apache.org/jira/browse/NUTCH-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ferdy Galema closed NUTCH-1148.
---
Resolution: Cannot Reproduce
Wow this really boggles my mind: I tried to do a final check with and wi
[
https://issues.apache.org/jira/browse/NUTCH-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1184:
-
Attachment: NUTCH-1185-1.5-9.patch
> Fetcher to parse and follow Nth degree outlinks
> --
[
https://issues.apache.org/jira/browse/NUTCH-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1184:
-
Attachment: (was: NUTCH-1185-1.5-9.patch)
> Fetcher to parse and follow Nth degree outlin
[
https://issues.apache.org/jira/browse/NUTCH-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1184:
-
Attachment: NUTCH-1185-1.5-9.patch
New patch [9] solves an issue of NPE in filtering. It's now re
Hi Lewis,
Please note that although most the formatting has been reverted, the
indent style is still not as usual. (You converted spaces to tabs.) When
using the default Eclipse xml editor, you can easily overcome this by
setting the preference "Indent using spaces" in XML --> XML Files -->
E
15 matches
Mail list logo