Hi,
I'm using XMLUnit primarily for HTMLDocumentBuilder and TolerantSaxDocumentBuilder (nice tools btw!). If I build a Document from html with   in it the String contents of the Node in question have weird bytes where the space should be. I ran into this trying to split a resulting string on whitespace.

For example, with <body>test&nbsp;after</body>, the body text string I get has the following utf-8 bytes:

bytes: 116 101 115 116 -62 -96 97 102 116 101 114

I was expecting to find 32 where the -62 and -96 are.  Bug?

I'm using latest version with java 1.6.0.16.

Thanks,
Tony Rozga

Here is a test (not JUnit though :):


import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.custommonkey.xmlunit.HTMLDocumentBuilder;
import org.custommonkey.xmlunit.TolerantSaxDocumentBuilder;
import org.custommonkey.xmlunit.XMLUnit;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;


public class XmlUnitBug {

   public static void main(String[] args) {

       try {
String html = "test&nbsp;after"; TolerantSaxDocumentBuilder tolerantSaxDocumentBuilder = new TolerantSaxDocumentBuilder(XMLUnit.newTestParser()); HTMLDocumentBuilder builder = new HTMLDocumentBuilder(tolerantSaxDocumentBuilder);
           Document doc = builder.parse(html);

           XPathFactory factory = XPathFactory.newInstance();
           XPath xpath = factory.newXPath();
           XPathExpression expr = xpath.compile("/html/body");
String body = ((NodeList) expr.evaluate(doc, XPathConstants.NODESET)).item(0).getTextContent();
           System.out.println("body: " + body);
           System.out.print("bytes: ");
           byte[] bytes = body.getBytes("UTF-8");
           for (byte b : bytes) {
               System.out.print(b);
               System.out.print(" ");
           }
           System.out.println("");
       } catch (Exception ex) {
           System.out.println("whoops: " + ex);
       }
   }
}

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Xmlunit-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xmlunit-general

Reply via email to