The robots parsing does not recognize the "sitemaps" line, which was likely not in the spec for robots when this connector was written.
Karl On Wed, Jul 7, 2021 at 3:31 AM h0444xk8 <h0444...@posteo.de> wrote: > Hi, > > I have a general question. Is the Web connector supporting sitemap files > referenced by the robots.txt? In my use case the robots.txt is stored in > the root of the website and is referencing two compressed sitemaps. > > Example of robots.txt > ------------------------ > User-Agent: * > Disallow: > Sitemap: https://www.example.de/sitemap/de-sitemap.xml.gz > Sitemap: https://www.example.de/sitemap/en-sitemap.xml.gz > > When start crawling in „Simple History" there is an error log entry as > follows: > > Unknown robots.txt line: 'Sitemap: > https://www.example.de/sitemap/en-sitemap.xml.gz' > > Is there a general problem with sitemaps at all or with sitemaps > referenced in robots.txt or with compressed sitemaps? > > Best regards > > Sebastian >