I'm running Apache on a Debian 9 system. root@localhost:~# apache2ctl -v Server version: Apache/2.4.25 (Debian) Server built: 2018-03-31T08:47:16
on a virtual private server, with one IP address. I have about 6 virtual hosts on there. One is https://www.g8wrb.org/ which has a directory 'data", with valve data sheets on it. So for example, there's a file https://www.g8wrb.org/data/Eimac/4CX10000D.pdf If Googlebot goes around looking for that it will find it. The problem is, Googlebot is looking on another domain https://www.kirkbymicrowave.co.uk/ for the same files, so for example, you can see the last line of the logs below, that googlebot is looking for /data/Eimac/4CX10000D.pdf on the https://www.kirkbymicrowave.co.uk/ domain, despite the fact that the file has never been on that website. It seems as though Google is mixing the two sites up in some way, and hunting for files on one domain, that should (and are) be on another domain hosted on the same server. Needless to say, when I look with Google Analytics, I see a ton of 404 errors, as Google can't find the files it is looking for on https://www.kirkbymicrowave.co.uk/, which is hardly surprising, as they were never there. Can anyone explain what might be happening? I have posted the four VirtualHosts related to the https://www.kirkbymicrowave.co.uk/ domain below. There are 4, to cover 4 possibilities, to cover of going to the domain without the www, and with www, and also to a non secure version on port 80, and a secure version on port 443. access-kirkbymicrowave.co.uk.log.6:66.249.66.66 - - [16/Jun/2018:06:11:01 +0000] "GET /complete-list.php/thanks/data/HP/data/Machlett_Laboratories/data/Eimac/3CX10000H3.pdf HTTP/1.1" 404 575 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; + http://www.google.com/bot.html)" access-kirkbymicrowave.co.uk.log.6:66.249.66.68 - - [16/Jun/2018:06:14:45 +0000] "GET /complete-list.php/thanks/data/HP/data/Machlett_Laboratories/data/Eimac/AB5.pdf HTTP/1.1" 404 568 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; + http://www.google.com/bot.html)" access-kirkbymicrowave.co.uk.log.6:66.249.66.70 - - [16/Jun/2018:06:22:27 +0000] "GET /complete-list.php/thanks/data/HP/data/Machlett_Laboratories/data/Eimac/4CX5000R.pdf HTTP/1.1" 404 573 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; + http://www.google.com/bot.html)" access-kirkbymicrowave.co.uk-SSL.log.4:66.249.64.64 - - [28/Jun/2018:22:32:18 +0000] "GET /data/Eimac/4-125A.pdf HTTP/1.1" 404 6325 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html )" access-kirkbymicrowave.co.uk-SSL.log.4:66.249.64.67 - - [28/Jun/2018:22:45:01 +0000] "GET /data/Eimac/4CX10000D.pdf HTTP/1.1" 404 6325 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; + http://www.google.com/bot.html)" <VirtualHost *:443> # The ServerName directive sets the request scheme, hostname and port that # the server uses to identify itself. This is used when creating # redirection URLs. In the context of virtual hosts, the ServerName # specifies what hostname must appear in the request's Host: header to # match this virtual host. For the default virtual host (this file) this # value is not decisive as it is used as a last resort host regardless. # However, you must set it for any further virtual host explicitly. ServerName www.kirkbymicrowave.co.uk ServerAdmin [email protected] DocumentRoot /var/www/html/kirkbymicrowave.co.uk SetOutputFilter DEFLATE SetEnvIfNoCase Request_URI "\.(?:gif|jpe?g|png)$" no-gzip # Available loglevels: trace8, ..., trace1, debug, info, notice, warn, # error, crit, alert, emerg. # It is also possible to configure the loglevel for particular # modules, e.g. #LogLevel info ssl:warn ErrorLog ${APACHE_LOG_DIR}/error-kirkbymicrowave.co.uk-SSL.log CustomLog ${APACHE_LOG_DIR}/access-kirkbymicrowave.co.uk-SSL.log combined SSLEngine on SSLCertificateKeyFile /etc/ssl/private/www_kirkbymicrowave_co_uk.key SSLCertificateFile /etc/ssl/ssl.crt/www_kirkbymicrowave_co_uk.crt SSLCertificateChainFile /etc/ssl/ssl.crt/www_kirkbymicrowave_co_uk.ca-bundle # For most configuration files from conf-available/, which are # enabled or disabled at a global level, it is possible to # include a line for only one particular virtual host. For example the # following line enables the CGI configuration for this host only # after it has been globally disabled with "a2disconf". #Include conf-available/serve-cgi-bin.conf ErrorDocument 404 /error-pages/404.html ErrorDocument 410 /error-pages/410.html ErrorDocument 500 /error-pages/500.html ErrorDocument 503 /error-pages/503.html </VirtualHost> <VirtualHost *:80> # Redirect www.kirkbymicrowave.co.uk on port 80 to the https site. ServerName www.kirkbymicrowave.co.uk ServerAdmin [email protected] ErrorLog ${APACHE_LOG_DIR}/error-www.kirkbymicrowave.co.uk-port-80.log CustomLog ${APACHE_LOG_DIR}/access-www.kirkbymicrowave.co.uk-port-80.log combined Redirect "/" "https://www.kirkbymicrowave.co.uk/" </VirtualHost> <VirtualHost *:80> # Redirect kirkbymicrowave.co.uk on port 80 to the https site. ServerName kirkbymicrowave.co.uk ServerAdmin [email protected] ErrorLog ${APACHE_LOG_DIR}/error-kirkbymicrowave.co.uk-port-80.log CustomLog ${APACHE_LOG_DIR}/access-kirkbymicrowave.co.uk-port-80.log combined Redirect "/" "https://www.kirkbymicrowave.co.uk/" </VirtualHost> <VirtualHost *:443> # Redirect kirkbymicrowave.co.uk on port 443 to the www. site. ServerName kirkbymicrowave.co.uk SSLEngine on SSLCertificateKeyFile /etc/ssl/private/www_kirkbymicrowave_co_uk.key SSLCertificateFile /etc/ssl/ssl.crt/www_kirkbymicrowave_co_uk.crt SSLCertificateChainFile /etc/ssl/ssl.crt/www_kirkbymicrowave_co_uk.ca-bundle ServerAdmin [email protected] ErrorLog ${APACHE_LOG_DIR}/error-kirkbymicrowave.co.uk-port-443.log CustomLog ${APACHE_LOG_DIR}/access-kirkbymicrowave.co.uk-port-443.log combined Redirect "/" "https://www.kirkbymicrowave.co.uk/" </VirtualHost>
