Re: [xml] Changes which might be required "Not converting names to lower-case in HTML parsing"

Daniel Veillard Mon, 10 Oct 2005 07:31:09 -0700

On Mon, Oct 10, 2005 at 06:13:19PM +0530, GPN wrote:
> Based on your inputs above, I am assuming that you are referring to
> the options:
> - enum xmlParserOption defined in include/libxml/parser.h
> - enum htmlParserOption defined in include/libxml/HTMLparser.h


  teh second one, yes

> htmlCtxtUseOptions() does the following -
> a) Normalize the HTML options to XML options
>    Probably to reflect the options in the core parsing engine
> b) Sets some members of the context structure
>    Probably for ease of condition checking.
> 
> "HTML_PARSE_RETAINCASE" could be added as the additional option,
> but need not reflect as a core XML parsing option.
> Do we need to add a member in the context structure (something
> like retainCase)?

  no, the remaining options should be kept in ctxt->options

> Do these checks have to be made conditional? For e.g.
>   if (options & HTML_PARSE_RETAINCASE) {
>     if (!xmlStrcasecmp()) {
>       /* Code segment */
>     }
>   } else {
>     if (xmlStrEqual()) {
>       /* Code segment */
>     }
>   }

  yes, if (ctxt->options & HTML_PARSE_RETAINCASE) ... the code segment
should not be duplicated of course, the conditional should be unified.

> >>>- In htmlParseName(), the condition which checks if the
> >>> current character is upper-case, and which transforms
> >>> it needs to be removed. Name can be stored as it is.
> >>
> >>
> >>  no. That would have to be conditionalized depending on a special
> >>parsing flag option. There is also  a number of tables indexed by
> >>the lowercase name and that will need to be preserved
> >>
> I hope the inclusion of the new option satisifies this comment.
> But, I am concerned about which tables might need to be taken care
> of, so that the engine is not broken.

  you will have to also generate the lower case version of the name
and use it for lookup in those tables.

> >>
> >>>- In other parts of the code (only in HTMLparser.c), the
> >>> comparsions using xmlStrEqual() for names, need to be
> >>> replaced by xmlStrcaseEqual().
> >>
> >>
> >>  I.e. makes a lot of costly calls instead of one costly and a number
> >>of cheap ones, I disagree with this approach.
> >>
> I hope this is also answered above. xmlStrcaseEqual() will not be
> used.
> 
> I did make these changes, and tested once. I found that some tags
> during the parse are missing out. For example and in particular,
> the "body" tag seems to be missed out. Probably, this is because
> I haven't taken care of the tables which you have mentioned above.

 Well, I can't tell ..

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Re: [xml] Changes which might be required "Not converting names to lower-case in HTML parsing"

Reply via email to