I had a quick look at Jira. I think there is already a ticket which
covers the reqirement of using a sitemap.xml file which is referenced by
robots.txt
https://issues.apache.org/jira/browse/CONNECTORS-1657
I'll update this ticket with infos from the sitemap protocol page
https://www.sitemaps.org
If you wish to add a feature request, please create a CONNECTORS ticket
that describes the functionality you think the connector should have.
Karl
On Wed, Jul 7, 2021 at 9:29 AM h0444xk8 wrote:
> Hi,
>
> yes, that seems to be the reason. In:
>
>
> https://github.com/apache/manifoldcf/blob/0307
Hi,
yes, that seems to be the reason. In:
https://github.com/apache/manifoldcf/blob/030703a7f2bbfbb5a8dcde529b29ead830a7f60c/connectors/rss/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/rss/Robots.java
there is the following code sequence:
else if (lowercaseLine.startsWith("
The robots parsing does not recognize the "sitemaps" line, which was likely
not in the spec for robots when this connector was written.
Karl
On Wed, Jul 7, 2021 at 3:31 AM h0444xk8 wrote:
> Hi,
>
> I have a general question. Is the Web connector supporting sitemap files
> referenced by the rob