Re: [xml] Parser error: html entities not defined

2007-09-13 Thread Bruno Dilly
Thank you very much, Daniel

On 9/10/07, Daniel Veillard [EMAIL PROTECTED] wrote:
 On Thu, Sep 06, 2007 at 07:22:21PM -0300, Bruno Dilly wrote:
  Indeed, the rss is not-well-formed. Is it possible to load an external
  dtd not included in the rss?

   Yes separately to validate a document. See the documentation.
 What you can't or should NOT try to do is to process something which
 is not well-formed to make it work if it's not XML.
   If an RSS feed is broken, DROP IT, then people will fix it !
 If you don't I think you make a disservice to the users, and you have
 no garantee from me that what you did to make it work with libxml2
 will continue to work in the future.

  For example, can I load
  http://my.netscape.com/publish/formats/rss-0.91.dtd before parse the
  file? And is possible to load it from a local file? How could I do it?

  What do you want to do ? You can use a separated DTD to validate an
 already parsed well-formed XML file. That's possible in the API. What you
 can't do is to modify the parsing to fake a non-existent DTD.
 If you want to  have the DTD local see the catalog support, there is a
 page describing it, and it's a standard.
http://xmlsoft.org/catalog.html

 Daniel

 --
 Red Hat Virtualization group http://redhat.com/virtualization/
 Daniel Veillard  | virtualization library  http://libvirt.org/
 [EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
 http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Parser error: html entities not defined

2007-09-04 Thread Daniel Veillard
On Tue, Sep 04, 2007 at 06:39:01AM -0300, Bruno Dilly wrote:
 Hi people,
 
 I'm trying to parse RSS with html entities, but I'm having the
 following errors when it tries to parse the rss file:
 Entity 'ntilde' not defined;
 Entity 'iacute' not defined;
 
 and others.

  are the HTML entities defined in the RSS DTD ? if yes then you
need to ask to load the DTD. If no, then using them there is an error.

 I tried to find the solution in the libxml2 documentation and in the
 list archives, but I didn't find the best way to solve the problem.
 
 My code is something like:
 
 LIBXML_TEST_VERSION
 xmlDoc *doc = NULL;
 xmlNode *root_element = NULL;
 
 doc = xmlReadFile(fileName.c_str(), NULL, 0 );
 root_element = xmlDocGetRootElement(doc);
 
 If you could help me, I will appreciate.
 ___
 xml mailing list, project page  http://xmlsoft.org/
 xml@gnome.org
 http://mail.gnome.org/mailman/listinfo/xml

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard  | virtualization library  http://libvirt.org/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Parser error: html entities not defined

2007-09-04 Thread Liam R E Quin
On Tue, 2007-09-04 at 07:01 -0400, Daniel Veillard wrote:
 On Tue, Sep 04, 2007 at 06:39:01AM -0300, Bruno Dilly wrote:
  Hi people,
  
  I'm trying to parse RSS with html entities, but I'm having the
  following errors when it tries to parse the rss file:
  Entity 'ntilde' not defined;
  Entity 'iacute' not defined;
[...]
   are the HTML entities defined in the RSS DTD ? if yes then you
 need to ask to load the DTD. If no, then using them there is an error.

It's worse than that :-)

RSS requires HTML markup to be escaped in descriptions, so you have
to write things like
amp;ntilde;
and the same for elements, lt;igt;...lt;/igt; to get i.../i into
an RSS feed.

A lot of RSS feeds are invalid.

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org www.advogato.org

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml