Takumi Fujiwara wrote:
When NeknoHTML parser corrects ill formated <form>,
does it builds some data structure to record which
form elements are part of the form?

NekoHTML operates in a streaming manner so it has no memory of what happened previously in the document. Therefore, once it has parsed something and sent that information through the pipeline, it's too late to fix up that element or content later when it has seen more of the document.

Could someone please tell me how does NekoHTML parser
handles situation like this? i.e. if I parse this html
using NekoHTML parser, how can I know i only need to
submit the first 2 form elements value during form
submittion?

However, there are always things that you can do. For example, in this particular case, you could write a filter that ignores form element children (e.g. <input>) if they appear outside of the </form> tag. Then you can just insert that filter before the tag-balancer in the parsing pipeline.

For example: (This code will NOT compile -- you have to
finish the code first... Plus, I am writing this from
memory so there may be errors.)

public class IgnoreFormChildren
  extends DefaultFilter {

  boolean inForm;

  // NOTE: It's safest to override *both* startDocument
  //       methods in order to work with *all* versions
  //       of Xerces2.
  public void startDocument(...) throws XNIException {
    inForm = false;
  }

  public void startElement(...) throws XNIException {
    HTMLElement.Element elem = HTMLElements.getElement(qname);
    if (elem.code == HTMLElements.FORM) {
      inForm = true;
    }

    boolean ignore = false;
    if (!inForm) {
      if (elem.parents[0].code == HTMLElements.FORM) {
        ignore = true;
      }
    }

    if (!ignore) {
      super.startElement(...);
    }
  }

  public void endElement(...) throws XNIException {
    HTMLElement.Element elem = HTMLElements.getElement(qname);
    if (elem.code == HTMLElements.FORM) {
      inForm = false;
    }
    super.endElement(...);
  }
}

Then...

DOMParser parser = new DOMParser();

XMLDocumentFilter[] filters = {
  new IgnoreFormChildren(),
  new HTMLTagBalancer(),
};

parser.setFeature("http://cyberneko.org/html/features/balance-tags";, false);
parser.setProperty("http://cyberneko.org/html/properties/filters";, filters);

parser.parse("index.html");

Hope this helps...

By the way, NekoHTML is not an Apache project so questions
regarding NekoHTML should not be posted here. You can send
your questions and comments directly to me.

yoroshiku...

--
Andy Clark * [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to