Thanks for all the responses.
I do appreciate the value of having Google index our sites, but my
concern is that it seems to be doing it repeatedly. This particular
repository has only 551 items; to generate the traffic for which
GoogleBot seems responsible, it would have to be repeatedly
On Mon, Sep 12, 2011 at 08:52:16AM +0200, Sean Carte wrote:
I do appreciate the value of having Google index our sites, but my
concern is that it seems to be doing it repeatedly. This particular
repository has only 551 items; to generate the traffic for which
GoogleBot seems responsible, it
Help!!!
I tried connecting to my dspace from another computer as an intranet
but the other computers cant see it nor my tomcat. i tried using the
server ip address (192.162.0.1:8080/dspace), it could not load. what
I'm i to do.
thanks
On 9/12/11, Mark H. Wood mw...@iupui.edu wrote:
On Mon, Sep
Two weeks ago I disabled google crawler completely by adding
'Disallow: /' to my robots.txt file. This has resulted in a huge
decrease in the volume of traffic as shown by the attached graph.
Previously I had my robots.txt file configured to Disallow everything
else, including /browse as I do
Hi Sean,
GoogleBot and the rest of the bots do account for a large amount of traffic.
I would estimate that about 75% of our traffic is serving bots. But I would
also estimate that a good number of users come in through Google search
results.
Blocking Google all-together will probably have a
Dear Panyarak,
Check ur sitemaps file. Find out the non existing pages and delete the URLs.
Plus in robots.txt file you have to give the location of the sitemaps.xml
file.
Regards
Vinit Kumar
Senior Research Fellow
Documentation Research and Training Centre
Bangalore
MLISc (BHU)
Varanasi,
Thanks Vinit for the information.
I checked the sitemap files under DSPACE/sitemaps and found that the
handles Google crawler keep on accessing do not exist in any of the files
there. Or there are sitemap files elsewhere?
How do I include the sitemap files in robot.txt? Sorry if this is a
Hi,
There are 2 points here:
1. In our repository, we have configured to allow crawler to browser our
site by putting a robot.txt with only one line :
User-agent: *
I have checked with webmaster tools and it reports that the crawler access
was success. Anyway, I am not quite sure that should
8 matches
Mail list logo