Re: [xml] Parser error: html entities not defined
Thank you very much, Daniel On 9/10/07, Daniel Veillard [EMAIL PROTECTED] wrote: On Thu, Sep 06, 2007 at 07:22:21PM -0300, Bruno Dilly wrote: Indeed, the rss is not-well-formed. Is it possible to load an external dtd not included in the rss? Yes separately to validate a document. See the documentation. What you can't or should NOT try to do is to process something which is not well-formed to make it work if it's not XML. If an RSS feed is broken, DROP IT, then people will fix it ! If you don't I think you make a disservice to the users, and you have no garantee from me that what you did to make it work with libxml2 will continue to work in the future. For example, can I load http://my.netscape.com/publish/formats/rss-0.91.dtd before parse the file? And is possible to load it from a local file? How could I do it? What do you want to do ? You can use a separated DTD to validate an already parsed well-formed XML file. That's possible in the API. What you can't do is to modify the parsing to fake a non-existent DTD. If you want to have the DTD local see the catalog support, there is a page describing it, and it's a standard. http://xmlsoft.org/catalog.html Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ [EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] Parser error: html entities not defined
On Tue, Sep 04, 2007 at 06:39:01AM -0300, Bruno Dilly wrote: Hi people, I'm trying to parse RSS with html entities, but I'm having the following errors when it tries to parse the rss file: Entity 'ntilde' not defined; Entity 'iacute' not defined; and others. are the HTML entities defined in the RSS DTD ? if yes then you need to ask to load the DTD. If no, then using them there is an error. I tried to find the solution in the libxml2 documentation and in the list archives, but I didn't find the best way to solve the problem. My code is something like: LIBXML_TEST_VERSION xmlDoc *doc = NULL; xmlNode *root_element = NULL; doc = xmlReadFile(fileName.c_str(), NULL, 0 ); root_element = xmlDocGetRootElement(doc); If you could help me, I will appreciate. ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ [EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] Parser error: html entities not defined
On Tue, 2007-09-04 at 07:01 -0400, Daniel Veillard wrote: On Tue, Sep 04, 2007 at 06:39:01AM -0300, Bruno Dilly wrote: Hi people, I'm trying to parse RSS with html entities, but I'm having the following errors when it tries to parse the rss file: Entity 'ntilde' not defined; Entity 'iacute' not defined; [...] are the HTML entities defined in the RSS DTD ? if yes then you need to ask to load the DTD. If no, then using them there is an error. It's worse than that :-) RSS requires HTML markup to be escaped in descriptions, so you have to write things like amp;ntilde; and the same for elements, lt;igt;...lt;/igt; to get i.../i into an RSS feed. A lot of RSS feeds are invalid. Liam -- Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/ Pictures from old books: http://fromoldbooks.org/ Ankh: irc.sorcery.net irc.gnome.org www.advogato.org ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml