Stefan Behnel <[EMAIL PROTECTED]> writes:

> Dmitry Dzhus wrote:
>> My aim is to apply XSLT to some HTML document (which may be broken
>> just a little). 
>> 
>> I'm using standard Python libxml2/libxslt bindings.
>> 
>> My code is:
>> 
>>    mf_extract = libxslt.parseStylesheetFile("mf-extract.xsl")
>>    
>>    doc = libxml2.readHtmlFile(url, None, libxml2.HTML_PARSE_RECOVER)
>>    
>>    mf_extract.applyStylesheet(doc, None)
>> 
>> Applying XSLT results as if there were no content in `doc` tree at
>> all. Using `readFile` instead of `readHtmlFile` works fine as
>> expected.
>> 
>> I tried to `print doc` after using both `readHtmlFile` and `readFile`
>> and noticed that, given the input document is well-formed, the output
>> differs only in XML declaration at the very beginning.
>> 
>> As I understand (and `document.type` indicates), using `readFile` and
>> `readHtmlFile` results in different kinds of documents --
>> `document_xml` and `document_html` -- while applying XSLT is only
>> possible with `document_xml` one. Is there any way to convert
>> `document_html` to `document_xml`?

Try the recover methods.


-- 
Nic Ferrier
http://www.tapsellferrier.co.uk   
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to