Re: [Dspace-tech] Bad robot! Googlebot and Internal Server Errors

2010-02-11 Thread Michael White
...@ucd.ie http://irserver.ucd.ie/dspace/ Message: 1 Date: Thu, 11 Feb 2010 12:30:04 + From: Michael White Subject: [Dspace-tech] Bad robot! Googlebot and Internal Server Errors To: "dspace-tech@lists.sourceforge.net" Message-ID: <7c43cb6f3460394f9b5236c0f68d

Re: [Dspace-tech] Bad robot! Googlebot and Internal Server Errors

2010-02-11 Thread Joseph Greene
itutional Repository Project Manager 325 James Joyce Library University College Dublin Belfield, Dublin 4 353 (0)1 716 7398 joseph.gre...@ucd.ie http://irserver.ucd.ie/dspace/ Message: 1 Date: Thu, 11 Feb 2010 12:30:04 + From: Michael White Subject: [Dspace-tech] Bad robot! Googlebot and Inter

Re: [Dspace-tech] Bad robot! Googlebot and Internal Server Errors

2010-02-11 Thread Michael White
pace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] Bad robot! Googlebot and Internal Server Errors On 11 February 2010 14:37, Tom De Mulder mailto:td...@cam.ac.uk>> wrote: You should add "/dspace" to the start of those disallowed patterns, because your DSpace URLs start with &qu

Re: [Dspace-tech] Bad robot! Googlebot and Internal Server

2010-02-11 Thread Michael White
Thanks Dorothea, > You found my favorite oldie bug! I'm guessing that item 1893/214 has been > withdrawn or deleted. I must admit, I didn't think to check, but having checked it now, I see that it is actually a Collection homepage (as are the others that I checked from a random sample) - not su

Re: [Dspace-tech] Bad robot! Googlebot and Internal Server Errors

2010-02-11 Thread Graham Triggs
On 11 February 2010 14:37, Tom De Mulder wrote: > You should add "/dspace" to the start of those disallowed patterns, > because your DSpace URLs start with "/dspace" after the hostname. > > You should also ensure that the robots.txt is available at the root of the server... ie. https://dspace.st

Re: [Dspace-tech] Bad robot! Googlebot and Internal Server Errors

2010-02-11 Thread Tom De Mulder
On Thu, 11 Feb 2010, Michael White wrote: >:session_id=9E40BFD899A2AA5C23E81404AF5B97A5:internal_error:-- URL Was: >https://dspace.stir.ac.uk/dspace/browse-title?bottom=1893/214 [snip] > > User-agent: * > > Disallow: /browse-author > Disallow: /items-by-author > D

Re: [Dspace-tech] Bad robot! Googlebot and Internal Server Errors

2010-02-11 Thread Dorothea Salo
You found my favorite oldie bug! I'm guessing that item 1893/214 has been withdrawn or deleted. 1.4.1 throws a fit when a crawler tries to browse a page that should begin with a withdrawn or deleted item. I've forgotten the fix (other than "upgrade to 1.4.2, in which the bug was squashed"), but it

[Dspace-tech] Bad robot! Googlebot and Internal Server Errors

2010-02-11 Thread Michael White
Hi, Our DSpace (v1.4.1) has recently started logging a lot of Internal Server Errors that appear to be being caused by a Googlebot. They appear to be happening like clockwork every 14 minutes and come in blocks (sometimes lasting several hours). They are all associated with the IP Address 66.2