On Fri, Nov 22, 2019 at 2:25 PM Richard Gaskin via use-livecode < use-livecode@lists.runrev.com> wrote:
> Trevor DeVore wrote: > > > While looking at solutions for converting HTML into XHTML that can be > > parsed by revXML I decided to test HTMLTidy which has an option to > > output the input as XHTML. While I could bundle up the tidy command > > line tool and include it with my app, I prefer to wrap things up in > > LCB if possible. > > Is conversion to XHTML the way to go? > > I've tried using the XML external to parse even RSS files -- ostensibly > pure XML -- only to find it choke on some of them. I've gone back to > hand-crafted pull-parsers. > There are definitely other ways to approach the problem I'm trying to solve. In fact, in other areas of my app I will extract parts of HTML by without relying on revXML. In this particular case I already have some LC code that parses HTML placed on the clipboard and converts it into data structure used by the application. This was originally implemented using the revXML callback feature (no tree is created in memory) and that API has worked well for the conversions I need to make. HTML may be placed on the clipboard when copying text and images from web browsers or by our good friend Microsoft Word. Microsoft Word places some very "interesting" HTML on the clipboard that needs to be massaged quite a bit before running it through revXML. There is a speed hit that occurs when running some of the regex patterns on the Word HTML that are used to strip out some markup and do things such as add quotes around attributes. Given the code that I have in place already, I would prefer to leverage HTMLTidy rather than fix every potential "gotcha" or spend time trying to optimize the code. I'm betting that HTMLTidy can do it better and faster given how mature it is. -- Trevor DeVore ScreenSteps www.screensteps.com _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode