standard XML file will not be impacted
> by our HTML specific method.
>
> What do you think ?
>
> Julien
>
> -Message d'origine-
> De : Karl Wright
> Envoyé : vendredi 6 septembre 2019 16:54
> À : dev
> Objet : Re: TagParseState behavior with Web connec
Objet : Re: TagParseState behavior with Web connector
*IF* you wanted to allow broken XML to be still correctly parsed, the first
thing you must do is come up with a list of exceptions to standard XML parsing
that you would want to support. Presuming that you have a browser that you
think is doing
egards,
> Julien
>
> -----Message d'origine-
> De : Karl Wright
> Envoyé : jeudi 5 septembre 2019 18:30
> À : dev
> Objet : Re: TagParseState behavior with Web connector
>
> The parser requires that the document being parsed be valid XML. Data
> within non-CDATA
?
Regards,
Julien
-Message d'origine-
De : Karl Wright
Envoyé : jeudi 5 septembre 2019 18:30
À : dev
Objet : Re: TagParseState behavior with Web connector
The parser requires that the document being parsed be valid XML. Data within
non-CDATA sections is *required* to use entity
The parser requires that the document being parsed be valid XML. Data
within non-CDATA sections is *required* to use entity references to include
< or > characters. See:
https://stackoverflow.com/questions/330725/use-of-greater-than-symbol-in-xml
Karl
On Thu, Sep 5, 2019 at 12:10 PM Julien
Hi Karl,
I discovered a problematic behavior with the
org.apache.manifoldcf.connectorcommon.fuzzyml.TagParseState class when
crawling web pages. This behavior poses problem in particular for the
scenario of form based authentication, as explained further in my email.
The