Hello,

We have a sitemap.xml pointing to further sitemaps. The XML seems fine, but 
Nutch things those two sitemap URL's are actually one consisting of both 
concatenated.

Here is https://www.saxion.nl/sitemap.xml

<?xml version="1.0" encoding="UTF-8"?>
<ns2:sitemapindex xmlns:ns2="http://www.sitemaps.org/schemas/sitemap/0.9";>
<sitemap>
<loc>https://www.saxion.nl/opleidingen-sitemap.xml</loc>
<loc>https://www.saxion.nl/content-sitemap.xml</loc>
</sitemap>
</ns2:sitemapindex>

This seems fine, but Nutch attempts, and obviously fails to load:

2018-05-25 16:27:50,515 ERROR [Thread-30] 
org.apache.nutch.util.SitemapProcessor: Error while fetching the sitemap. 
Status code: 14 for 
https://www.saxion.nl/opleidingen-sitemap.xmlhttps://www.saxion.nl/content-sitemap.xml

What is going on here? Why does Nutch, or CC's sitemap util behave like this?

Thanks,
Markus

Reply via email to