[jira] [Commented] (CONNECTORS-1680) WebConnector: Support the Document Base URL element
[ https://issues.apache.org/jira/browse/CONNECTORS-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461592#comment-17461592 ] Markus Schuch commented on CONNECTORS-1680: --- fixed with r1896101 > WebConnector: Support the Document Base URL element > --- > > Key: CONNECTORS-1680 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1680 > Project: ManifoldCF > Issue Type: Improvement > Components: Web connector >Reporter: Markus Schuch >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.21 > > > HTML allows to specifiy the base URL to use for all relative URLs in a > document: > {code:java} > > > > https://example.org/"/> > ... > >... > {code} > [https://developer.mozilla.org/de/docs/Web/HTML/Element/base] > The Web Connector should respect this element when handling relative links. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (CONNECTORS-1680) WebConnector: Support the Document Base URL element
[ https://issues.apache.org/jira/browse/CONNECTORS-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461575#comment-17461575 ] Karl Wright commented on CONNECTORS-1680: - The baseDocumentIdentifier code is new for adding the ability to set the base URL for all relative references, a change you requested and I implemented. All relative document urls should be based on baseDocumentIdentifier, and that is initialized to be documentIdentifier at the start. > WebConnector: Support the Document Base URL element > --- > > Key: CONNECTORS-1680 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1680 > Project: ManifoldCF > Issue Type: Improvement > Components: Web connector >Reporter: Markus Schuch >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.21 > > > HTML allows to specifiy the base URL to use for all relative URLs in a > document: > {code:java} > > > > https://example.org/"/> > ... > >... > {code} > [https://developer.mozilla.org/de/docs/Web/HTML/Element/base] > The Web Connector should respect this element when handling relative links. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (CONNECTORS-1680) WebConnector: Support the Document Base URL element
[ https://issues.apache.org/jira/browse/CONNECTORS-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461544#comment-17461544 ] Markus Schuch commented on CONNECTORS-1680: --- [~kwri...@metacarta.com] in my journey of fixing the travis ci build i found out, that the {{org.apache.manifoldcf.crawler.connectors.webcrawler.tests.SessionLoginHSQLDBIT}} fails after this change: {code:java} [junit] - --- [junit] Testcase: sessionCrawl(org.apache.manifoldcf.crawler.connectors.webcrawler.tests.SessionLoginHSQLDBIT): Caused an ERROR [junit] Wrong number of documents processed - expected 101, saw 1 [junit] org.apache.manifoldcf.core.interfaces.ManifoldCFException: Wrong number of documents processed - expected 101, saw 1 [junit] at org.apache.manifoldcf.crawler.connectors.webcrawler.tests.SessionTester.executeTest(SessionTester.java:166) [junit] at org.apache.manifoldcf.crawler.connectors.webcrawler.tests.SessionLoginHSQLDBIT.sessionCrawl(SessionLoginHSQLDBIT.java:59) [junit] [junit] {code} > WebConnector: Support the Document Base URL element > --- > > Key: CONNECTORS-1680 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1680 > Project: ManifoldCF > Issue Type: Improvement > Components: Web connector >Reporter: Markus Schuch >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.21 > > > HTML allows to specifiy the base URL to use for all relative URLs in a > document: > {code:java} > > > > https://example.org/"/> > ... > >... > {code} > [https://developer.mozilla.org/de/docs/Web/HTML/Element/base] > The Web Connector should respect this element when handling relative links. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (CONNECTORS-1680) WebConnector: Support the Document Base URL element
[ https://issues.apache.org/jira/browse/CONNECTORS-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17450108#comment-17450108 ] Markus Schuch commented on CONNECTORS-1680: --- I tried crawling a website with Base URL element and the discovered links look correct now. I also looked over the code changes. One minor issue: the assignment of {{baseDocumentIdentifier}} in the constructor of {{ProcessActivityLinkHandler}} looks useless (assigned to itself) --> https://github.com/apache/manifoldcf/commit/8b88d6d44aba54ee276619340ad36a9cc441932d#diff-9fff09bc306de2115d24403a5fe1c3c9fe831d30a54448291f1f90b4d5c6b2a7R3889 > WebConnector: Support the Document Base URL element > --- > > Key: CONNECTORS-1680 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1680 > Project: ManifoldCF > Issue Type: Improvement > Components: Web connector >Reporter: Markus Schuch >Assignee: Karl Wright >Priority: Major > Fix For: ManifoldCF 2.21 > > > HTML allows to specifiy the base URL to use for all relative URLs in a > document: > {code:java} > > > > https://example.org/"/> > ... > >... > {code} > [https://developer.mozilla.org/de/docs/Web/HTML/Element/base] > The Web Connector should respect this element when handling relative links. -- This message was sent by Atlassian Jira (v8.20.1#820001)