So ok, I found a solution - surely not the optimal one - but I will share my experience with you.

HtmlParser is not "extends enabled" because:
1 - all attributes are private and have to be protected
2 - resolve() is in the same case
3 - call to super.startElement() is not so easy because of body/title/discard level counting.

HtmlParser is more extendEnabled, but the only reason why I extend this class is to modify the "hardcoded" new HtmlHandler in expression parser.setContentHandler(new XHTMLDowngradeHandler( new HtmlHandler(this, handler, metadata)));

to MyHtmlHandler(...).

Maybe a configuration solution for this class instanciation will be profitable.

Can you tell me if I don't take the right way, and if a possibility to "overwrite/extend" the features of parser is in your roadmap ?

My two pences...
have a good day
++

Florent André wrote:
Hi all,

I work on html parsing via generic AutoDetectParser() class.

I have to keep some "specific" attributes (id and class) in <table>
attribute in order to detect witch table have "meaning" for my app.

So, as far as I understand for now, I have to :
- extend HtmlHandler with MyHtmlHandler

- in MyHtmlHandler override public void startElement(...) with something
like this :

if (bodyLevel == 0 && discardLevel == 0) {
  if ("TABLE".equals(name)){
    AttributesImpl attributes = new AttributesImpl();
String id = atts.getValue("id");
    String class = atts.getValue("class");
    if (id != null){
attributes.addAttribute("", "id", "id", "CDATA", id); }
    if (class != null){
attributes.addAttribute("", "class", "class", "CDATA", class); } xhtml.startElement("http://www.w3.org/1999/xhtml";, "table", "table", attributes); }
  else{
    //if other that table
    super.startElement(...)
  }
else{
//if other bodyLevel and discardLevel
super.startElement(...)
}


- And finally pass MyHtmlHandler to parse() method via parseContext.
*****

* This is the right way to do such a thing ? * How I can use the parseContext to pass MyHtmlHandler ? I don't find any
example on it...


Any comment will be much appreciated,

Have a good day

Reply via email to