Re: [Dspace-tech] Google Crawler

2011-09-12 Thread Sean Carte
Thanks for all the responses. I do appreciate the value of having Google index our sites, but my concern is that it seems to be doing it repeatedly. This particular repository has only 551 items; to generate the traffic for which GoogleBot seems responsible, it would have to be repeatedly

Re: [Dspace-tech] Google Crawler

2011-09-12 Thread Mark H. Wood
On Mon, Sep 12, 2011 at 08:52:16AM +0200, Sean Carte wrote: I do appreciate the value of having Google index our sites, but my concern is that it seems to be doing it repeatedly. This particular repository has only 551 items; to generate the traffic for which GoogleBot seems responsible, it

Re: [Dspace-tech] Google Crawler

2011-09-12 Thread Akeredolu Joshua
Help!!! I tried connecting to my dspace from another computer as an intranet but the other computers cant see it nor my tomcat. i tried using the server ip address (192.162.0.1:8080/dspace), it could not load. what I'm i to do. thanks On 9/12/11, Mark H. Wood mw...@iupui.edu wrote: On Mon, Sep

[Dspace-tech] Google Crawler

2011-09-09 Thread Sean Carte
Two weeks ago I disabled google crawler completely by adding 'Disallow: /' to my robots.txt file. This has resulted in a huge decrease in the volume of traffic as shown by the attached graph. Previously I had my robots.txt file configured to Disallow everything else, including /browse as I do

Re: [Dspace-tech] Google Crawler

2011-09-09 Thread Peter Dietz
Hi Sean, GoogleBot and the rest of the bots do account for a large amount of traffic. I would estimate that about 75% of our traffic is serving bots. But I would also estimate that a good number of users come in through Google search results. Blocking Google all-together will probably have a

Re: [Dspace-tech] Google crawler repeatedly requests non-existent handles...

2010-09-23 Thread Vinit
Dear Panyarak, Check ur sitemaps file. Find out the non existing pages and delete the URLs. Plus in robots.txt file you have to give the location of the sitemaps.xml file. Regards Vinit Kumar Senior Research Fellow Documentation Research and Training Centre Bangalore MLISc (BHU) Varanasi,

Re: [Dspace-tech] Google crawler repeatedly requests non-existent handles...

2010-09-23 Thread Panyarak Ngamsritragul
Thanks Vinit for the information. I checked the sitemap files under DSPACE/sitemaps and found that the handles Google crawler keep on accessing do not exist in any of the files there. Or there are sitemap files elsewhere? How do I include the sitemap files in robot.txt? Sorry if this is a

[Dspace-tech] Google crawler repeatedly requests non-existent handles...

2010-09-20 Thread Panyarak Ngamsritragul
Hi, There are 2 points here: 1. In our repository, we have configured to allow crawler to browser our site by putting a robot.txt with only one line : User-agent: * I have checked with webmaster tools and it reports that the crawler access was success. Anyway, I am not quite sure that should