Re: TagParseState behavior with Web connector

2019-09-09 Thread Karl Wright
standard XML file will not be impacted > by our HTML specific method. > > What do you think ? > > Julien > > -Message d'origine- > De : Karl Wright > Envoyé : vendredi 6 septembre 2019 16:54 > À : dev > Objet : Re: TagParseState behavior with Web connec

RE: TagParseState behavior with Web connector

2019-09-09 Thread julien.massiera
Objet : Re: TagParseState behavior with Web connector *IF* you wanted to allow broken XML to be still correctly parsed, the first thing you must do is come up with a list of exceptions to standard XML parsing that you would want to support. Presuming that you have a browser that you think is doing

Re: TagParseState behavior with Web connector

2019-09-06 Thread Karl Wright
egards, > Julien > > -----Message d'origine- > De : Karl Wright > Envoyé : jeudi 5 septembre 2019 18:30 > À : dev > Objet : Re: TagParseState behavior with Web connector > > The parser requires that the document being parsed be valid XML. Data > within non-CDATA

RE: TagParseState behavior with Web connector

2019-09-06 Thread julien.massiera
? Regards, Julien -Message d'origine- De : Karl Wright Envoyé : jeudi 5 septembre 2019 18:30 À : dev Objet : Re: TagParseState behavior with Web connector The parser requires that the document being parsed be valid XML. Data within non-CDATA sections is *required* to use entity

Re: TagParseState behavior with Web connector

2019-09-05 Thread Karl Wright
The parser requires that the document being parsed be valid XML. Data within non-CDATA sections is *required* to use entity references to include < or > characters. See: https://stackoverflow.com/questions/330725/use-of-greater-than-symbol-in-xml Karl On Thu, Sep 5, 2019 at 12:10 PM Julien

TagParseState behavior with Web connector

2019-09-05 Thread Julien Massiera
Hi Karl, I discovered a problematic behavior with the org.apache.manifoldcf.connectorcommon.fuzzyml.TagParseState class when crawling web pages. This behavior poses problem in particular for the scenario of form based authentication, as explained further in my email. The