Re: Form Correction in NekoHTML parser

Andy Clark 8 Mar 2004 18:31:09 -0000

Takumi Fujiwara wrote:

When NeknoHTML parser corrects ill formated <form>,
does it builds some data structure to record which
form elements are part of the form?


NekoHTML operates in a streaming manner so it has no
memory of what happened previously in the document.
Therefore, once it has parsed something and sent that
information through the pipeline, it's too late to fix
up that element or content later when it has seen more
of the document.

Could someone please tell me how does NekoHTML parser
handles situation like this? i.e. if I parse this html
using NekoHTML parser, how can I know i only need to
submit the first 2 form elements value during form
submittion?


However, there are always things that you can do. For
example, in this particular case, you could write a
filter that ignores form element children (e.g. <input>)
if they appear outside of the </form> tag. Then you can
just insert that filter before the tag-balancer in the
parsing pipeline.

For example: (This code will NOT compile -- you have to
finish the code first... Plus, I am writing this from
memory so there may be errors.)

public class IgnoreFormChildren
  extends DefaultFilter {

  boolean inForm;

  // NOTE: It's safest to override *both* startDocument
  //       methods in order to work with *all* versions
  //       of Xerces2.
  public void startDocument(...) throws XNIException {
    inForm = false;
  }

  public void startElement(...) throws XNIException {
    HTMLElement.Element elem = HTMLElements.getElement(qname);
    if (elem.code == HTMLElements.FORM) {
      inForm = true;
    }

    boolean ignore = false;
    if (!inForm) {
      if (elem.parents[0].code == HTMLElements.FORM) {
        ignore = true;
      }
    }

    if (!ignore) {
      super.startElement(...);
    }
  }

  public void endElement(...) throws XNIException {
    HTMLElement.Element elem = HTMLElements.getElement(qname);
    if (elem.code == HTMLElements.FORM) {
      inForm = false;
    }
    super.endElement(...);
  }
}

Then...

DOMParser parser = new DOMParser();

XMLDocumentFilter[] filters = {
  new IgnoreFormChildren(),
  new HTMLTagBalancer(),
};

parser.setFeature("http://cyberneko.org/html/features/balance-tags";, false);
parser.setProperty("http://cyberneko.org/html/properties/filters";, filters);

parser.parse("index.html");

Hope this helps...

By the way, NekoHTML is not an Apache project so questions
regarding NekoHTML should not be posted here. You can send
your questions and comments directly to me.

yoroshiku...

--
Andy Clark * [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Form Correction in NekoHTML parser

Reply via email to