[jira] [Commented] (CONNECTORS-1657) Web connector - Handle sitemap instruction in robot.txt
[ https://issues.apache.org/jira/browse/CONNECTORS-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17219079#comment-17219079 ] Julien Massiera commented on CONNECTORS-1657: - Yes a warning in the log but an ERROR in the simple history. We should at least change the return code of the activity don't you agree ? > Web connector - Handle sitemap instruction in robot.txt > --- > > Key: CONNECTORS-1657 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1657 > Project: ManifoldCF > Issue Type: Improvement > Components: Web connector >Affects Versions: ManifoldCF 2.17 >Reporter: Julien Massiera >Priority: Major > > Currently the web connector does not understand when the robot.txt file > points a sitemap. As an example, for the site > [https://www.persee.fr,|https://www.persee.fr%2C/] in the simple history one > can find the following error: > Unknown robots.txt line: 'Sitemap: [https://www.persee.fr/sitemap.xml'] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CONNECTORS-1657) Web connector - Handle sitemap instruction in robot.txt
[ https://issues.apache.org/jira/browse/CONNECTORS-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17219054#comment-17219054 ] Karl Wright commented on CONNECTORS-1657: - This is a warning only: {code} Logging.connectors.warn("Web: Unknown robots.txt line from '"+hostName+"': '"+problemLine+"'"); {code} No problems are caused when the robots.txt line is found. > Web connector - Handle sitemap instruction in robot.txt > --- > > Key: CONNECTORS-1657 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1657 > Project: ManifoldCF > Issue Type: Improvement > Components: Web connector >Affects Versions: ManifoldCF 2.17 >Reporter: Julien Massiera >Priority: Major > > Currently the web connector does not understand when the robot.txt file > points a sitemap. As an example, for the site > [https://www.persee.fr,|https://www.persee.fr%2C/] in the simple history one > can find the following error: > Unknown robots.txt line: 'Sitemap: [https://www.persee.fr/sitemap.xml'] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CONNECTORS-1657) Web connector - Handle sitemap instruction in robot.txt
Julien Massiera created CONNECTORS-1657: --- Summary: Web connector - Handle sitemap instruction in robot.txt Key: CONNECTORS-1657 URL: https://issues.apache.org/jira/browse/CONNECTORS-1657 Project: ManifoldCF Issue Type: Improvement Components: Web connector Affects Versions: ManifoldCF 2.17 Reporter: Julien Massiera Currently the web connector does not understand when the robot.txt file points a sitemap. As an example, for the site [https://www.persee.fr,|https://www.persee.fr%2C/] in the simple history one can find the following error: Unknown robots.txt line: 'Sitemap: [https://www.persee.fr/sitemap.xml'] -- This message was sent by Atlassian Jira (v8.3.4#803005)