Tim Davidson wrote:
Thanks,

 What the real problem is that we have over 200 referances in our code to 
(Element)nodeList.getNode(i), and throughout various points we assume a node is an 
Element and cast accordingly. This code is wrong and should have been done properly from 
the beginning, but I didn't write it - but going through the entire code and replacing 
"(Element)nodeList.getNode(i)" with
"if(nodeList.getNode(i) instanceof TextNode) // or 
nodeList.getNode(i).getType() == Node.TEXT_NODE
 {
   (Element)nodeList.getNode(i);
 }
 else
 {
  continue;
 }"
is not really a fesiable option (although I am considering doing this by using find 
& replace and then hoping for the best!!).

 What we *ideally* want to do is tell the parser we dont care about the 
whitespaces and newlines between elements, so

 "<element><child /></element>"

produces exactly the same Document as

"<element> <child />

 </element>"
 I don't think this is an unreasonable demand - since many applications don't 
want to be bothered with how the document is formatted?

It's not just whitespace. Also comments, processing instructions.

You have 200 bugs in your code caused by misunderstanding XML. My suggestion would be to fix them.

Bob Foster

 Failing that, after the Document has been created, strip out all the Text 
Nodes, but I wrote a method to strip out Text Nodes, but it didn't work (if you 
call the method twice it finds Text Nodes each time), and I see in the archives 
that someone else had the same problem and it cant be done?

 I see that another option would be to write a schema that says we don't care 
about mixed content, however I tried this and didn't have much luck with it. 
The other problem being we dont know what the root element will be when we load 
the file.

 What I've had to do for now is load the XML file into a StringBuffer and strip 
out the spaces manually, i.e.

" private static StringBuffer X_removeTextNodes(StringBuffer p_stringBuffer)
{
int start = p_stringBuffer.indexOf(">");


      int end = p_stringBuffer.indexOf("<", start + 1);

      while((start != -1) && (end != -1))
      {
         if(((start - 1) < (end - 1)) && (start != (end - 1)))
         {
            p_stringBuffer = p_stringBuffer.delete(start + 1, end);
         }

         start    = p_stringBuffer.indexOf(">", start + 1);

         end = p_stringBuffer.indexOf("<", start);
      }

      return p_stringBuffer;
   }
"

but this is a NASTY solution. does anyone have any ideas?

Thanks.

-----Original Message-----
From: Bob Foster [mailto:[EMAIL PROTECTED]
Sent: 18 December 2003 22:18
To: [EMAIL PROTECTED]
Subject: Re: How can you prevent DeferredTextImpl?


Elena Litani wrote:

...As Jeff pointed out, your code probably has an error: you expect that the
first child of the element is element but in fact it is a Text node, hence
you get a CastException.

Instead, to traverse the DOM Tree you need to write some kind of a switch
statement, switching on a nodeType and casting to appropriate node, see
samples/dom/Counter.java.


The question probably was, is there some magic switch that will make
"insignificant" whitespace go away? Nope. You also have to deal with
insignificant comments and processing instructions.

This is such a common use case, you (Tim) might find it useful to
implement two helper functions:

public Element getFirstElementChild(Element parent) {
   return findElement(parent.getFirstChild());
}

public Element getNextElementSibling(Node node) {
   return findElement(node.getNextSibling());
}

private Element findElement(Node node) {
   while (node != null && node.getNodeType() != Node.ELEMENT_NODE)
     node = node.getNextSibling();
   return node;
}

// typed from memory - if they don't compile/run, fix 'em

You could add error checking for non-whitespace text if desired.

Bob Foster
http://xmlbuddy.com/



--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to