[att] francelabs [dot] com
)
> HTML extractor produces invalid XML
> ---
>
> Key: CONNECTORS-1656
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1656
> Project: ManifoldCF
>
[point] Ulmer [att] francelabs [dot] com
> HTML extractor produces invalid XML
> ---
>
> Key: CONNECTORS-1656
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1656
> Project: ManifoldCF
>
it was attached, for some reason.
> HTML extractor produces invalid XML
> ---
>
> Key: CONNECTORS-1656
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1656
> Project: ManifoldCF
>
?
> HTML extractor produces invalid XML
> ---
>
> Key: CONNECTORS-1656
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1656
> Project: ManifoldCF
> Issue Type: Bug
> Co
/vnd.wap.xhtml+xm
application/x-asp
application/xhtml+xml
So as it handles html and xhtml, the processed files have to be XML valid anyway
> HTML extractor produces invalid XML
> ---
>
> Key: CONNECTORS-1656
>
offline experimentation.
> HTML extractor produces invalid XML
> ---
>
> Key: CONNECTORS-1656
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1656
> Project: ManifoldCF
>
[
https://issues.apache.org/jira/browse/CONNECTORS-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karl Wright reassigned CONNECTORS-1656:
---
Assignee: Karl Wright
> HTML extractor produces invalid
Julien Massiera created CONNECTORS-1656:
---
Summary: HTML extractor produces invalid XML
Key: CONNECTORS-1656
URL: https://issues.apache.org/jira/browse/CONNECTORS-1656
Project: ManifoldCF
I've added the component as requested. As for the advice, I suggest you
create a ticket and we can discuss there.
Karl
On Tue, Oct 20, 2020 at 6:24 AM wrote:
> Hi,
>
>
>
> I noticed a problem with the HTML extractor connector. It produces valid
> HTML doc (when the 'Strip HTML' option is
Hi,
I noticed a problem with the HTML extractor connector. It produces valid
HTML doc (when the 'Strip HTML' option is disabled) but invalid XML (some
tags like img do not have closing tag), and in some cases it is problematic.
For example, when Tika is used behind, it processes the document
10 matches
Mail list logo