[jira] [Commented] (CONNECTORS-1657) Web connector - Handle sitemap instruction in robot.txt

2020-10-22 Thread Julien Massiera (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17219079#comment-17219079
 ] 

Julien Massiera commented on CONNECTORS-1657:
-

Yes a warning in the log but an ERROR in the simple history. We should at least 
change the return code of the activity don't you agree ? 

> Web connector - Handle sitemap instruction in robot.txt
> ---
>
> Key: CONNECTORS-1657
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1657
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Priority: Major
>
> Currently the web connector does not understand when the robot.txt file 
> points a sitemap. As an example, for the site 
> [https://www.persee.fr,|https://www.persee.fr%2C/] in the simple history one 
> can find the following error:
> Unknown robots.txt line: 'Sitemap: [https://www.persee.fr/sitemap.xml']
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1657) Web connector - Handle sitemap instruction in robot.txt

2020-10-22 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17219054#comment-17219054
 ] 

Karl Wright commented on CONNECTORS-1657:
-

This is a warning only:

{code}
Logging.connectors.warn("Web: Unknown robots.txt line from '"+hostName+"': 
'"+problemLine+"'");
{code}

No problems are caused when the robots.txt line is found.


> Web connector - Handle sitemap instruction in robot.txt
> ---
>
> Key: CONNECTORS-1657
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1657
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Priority: Major
>
> Currently the web connector does not understand when the robot.txt file 
> points a sitemap. As an example, for the site 
> [https://www.persee.fr,|https://www.persee.fr%2C/] in the simple history one 
> can find the following error:
> Unknown robots.txt line: 'Sitemap: [https://www.persee.fr/sitemap.xml']
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CONNECTORS-1657) Web connector - Handle sitemap instruction in robot.txt

2020-10-22 Thread Julien Massiera (Jira)
Julien Massiera created CONNECTORS-1657:
---

 Summary: Web connector - Handle sitemap instruction in robot.txt
 Key: CONNECTORS-1657
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1657
 Project: ManifoldCF
  Issue Type: Improvement
  Components: Web connector
Affects Versions: ManifoldCF 2.17
Reporter: Julien Massiera


Currently the web connector does not understand when the robot.txt file points 
a sitemap. As an example, for the site 
[https://www.persee.fr,|https://www.persee.fr%2C/] in the simple history one 
can find the following error:

Unknown robots.txt line: 'Sitemap: [https://www.persee.fr/sitemap.xml']

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)