There is a COM wrapper for Xerces-C that mimics the MSXML DOM interfaces,
however at this point there is little value in using it since MSXML has
significantly improved its standards conformance.
-
To unsubscribe, e-mail: [EMAIL
> Don't assume a BOM *wouldn't* be in a string, it could well be and its
> perfectly legal to do so. You should be able to deal with it either way.
I've never encountered any use of a BOM in a BSTR. Would you ever expect
one in a java.lang.String?
As it has been, the method would always fail.
Sean's patch looks good to me. Basically, it fixes loadXML which parses an
XML document passed as a string. Originally, UTF-16 was set as the encoding
which caused Xerces to look for a byte order mark that would not be present
in a string (but would in a file). Setting UTF-16LE explicitly sets
> This behaviour makes it impossible to simply replace MSXML by Xerces.
Actually, that usage is pretty unusual and most MSXML calling VB code
doesn't run into the problem. If you had dim'd strXML as String instead of
Variant it would have worked. You are right that there should be a call to
Var
My guess is the source document is encoded in ISO-8859-1, but does not have
an XML declaration so it is interpreted as UTF-8. Add a:
XML documents are not affected by the locale of the processor, they contain
their encoding information so that a document written in France doesn't
magically get
Per the DOM spec
(http://www.w3.org/TR/DOM-Level-2-Core/core.html#ID-1950641247),
Node.nodeValue is defined to return null for Document, DocumentFragment,
DocumentType, Element, Entity, EntityReference and Notation nodes. There is
not enough in your message to see if you are actually seeing some
The nullEnt problem that you mentioned earlier is due to a production
problem with the OASIS conformance tests zip file where 0 length files were
omitted from the archive. I know there were a substantial amount of issues
(both production and otherwise) with the last conformance suite. You might
Calling setNodeValue on an Element object should do nothing, per
http://www.w3.org/tr/DOM-Level-2-Core/core.html#ID-F68D080 , so the
observed behavior of throwing an exception is wrong, but it isn't supposed
to magically create a child text element for you either.
If you want to add or change tex
Space is not a legal name character and an implementation is required by the
DOM spec to throw an exception if you try to create an element with an
illegal name. See
http://www.w3.org/TR/DOM-Level-2-Core/core.html#ID-2141741547
Can your application recover from the problem, if so, then you should
Andre M Descombes wrote:
>I tried the IDOMDocument.loadXML command and I am passing it a string I
obtained reading an >UTF-8 encoded XML file.
>
>I get the following error:
>
>An exception occured! Type:TranscodingException, Message:Could not
>create a converter for encoding: UTF-16
Does it occur
XMLByte does not represent a "one byte" character, it just represents one
byte. Xerces-C will infer the encoding from a byte stream (whether from a
file or a memory buffer) according the to algorithm described at
http://www.w3.org/TR/REC-xml.html#sec-guessing
Either your memory buffer needs to s
DOM_Element is derived from DOM_Node. The only thing
that declaring it as a friend would give it is access
to the private members of DOM_Node. If the authors of
DOM_Node wanted a derived class to see those members
they would have been declared protected.
If there is a legimate need for DOM_Elem
If the file is accessible by HTTP it is a no-brainer, however I guess that
is not the case.
Xerces-C supports a file: protocol that should do the trick if you are
trying to access a UNC addressed resource on Windows. Be aware that the
file: protocol is only vaguely defined and platform specific,
Do the parse on its own thread and suspend the thread when you want to pause
the parse (something like WaitForSingleObject on Win32).
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
The XML declaration is not a processing instruction, it only looks like one.
It really isn't appropriate for it be to in the DOM since it describes how
the document is encoded in a particular file. For example, if you read an
XML document that is encoded in ISO-8859-1, it ceases being encoded in
Very complicated question that depends a significant amount on the style and
size of document that you are processing.
You should definitely take a look at "Inside MSXML Performance" at
http://msdn.microsoft.com/xml/articles/xml02212000.asp Andy Heninger
modified
this test to test the current Xe
> Ok, yes if you go through the parser and you get it out in XMLCh format,
> then you could do a trivial truncation to get 8859-1. But, you'd have to
> scan the entire outgoing contents to figure out whether you could do it.
If
> there is a lot of source, if you add up that extra overhead, and the
The encoding used in the source document or the current code page is
immaterial to the converting UTF-16 to ISO-8859-1. I definitely don't
want to be using any arbitrary encoding as an internal representation.
Whatever the original code point used in the source file was converted
to the correspon
Double check that your DTD system id isn't using backslashes instead of
forward slashes. Backslashes are not legal in URL's, but some software will
produce (and others accept it) as legal.
-
To unsubscribe, e-mail: [EMAIL PROTE
My guess would be that it was actually in the transcoding of your char* to
XMLCh* before
the creation of the DOMString.
If the transcoder used in the conversion was, for example, UTF-8, then I
would expect you would either get an exception or, at least, unusual
behavior. You may need to set the
> >The DOM spec states that negative values for the count value should throw
> an INDEX_SIZE_ERR
> >exception. In Xerces-C, the arguments are defined as unsigned int which
> results in
> >the negative values in the tests being interpreted as very large values.
>
> If the binding defines the argum
See http://xml.apache.org/xerces-c/faq-parse.html#faq-23
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
If C++'s action on trying to cast a negative int to an unsigned int was an
exception, then using unsigned int's in the parameter list could be a good
thing. But since you would expect that the count would often be calculated
by some expression that would involve subtraction, having a bunged
expre
I was able to port the NIST Java DOM test suite to
JUnit and
then on to CppUnit for Xerces-C.
The source is in the CVS
of xmlconf.sourceforge.net (http://sourceforge.net/cvs/?group_id=8114)
but I haven't updated the site.
Running the tests resulted in three conformance errors. One is a le
I spent a few hours over the last few days
experimenting with recasting the NIST Java DOM test suite using Junit (http://www.junit.org) and then porting the
Junit code the JSUnit (http://www.jsunit.net) and to CppUnit (if
anyone wants to experiment with any of the other parallel frameworks at
Since I hadn't heard of SAC, I thought it might be reasonable that some
others on the list hadn't either. SAC is "Simple API for CSS" and was
submitted as a W3C note last summer: http://www.w3.org/TR/SAC/
I'm curious why a SAC implementation would need to be integrated into the
XML parser and ju
I guess it is time for some exploration and profiling and maybe a few
alternative approaches.
The point about UTF-8 and UTF-16 strings definitely suggest that allowing
different string implementations for internal vs external use might be
valuable. In extreme cases (like the OT stuff), then gzip
>Henry Zongaro wrote:
> Hi Curt,
>
> Co-incidentally, I started looking into Andy Heninger's prototype
code
> yesterday, and I'd like to volunteer to work on completing it.
There are a couple of things that I would like to do as part of the overall
project.
1. Port the NIST Java DOM Level 1
Not quite sure what you are trying to do. I think that you has
misinterpreted how to use getNextSibling(). Typically, you do a loop like:
for(DOMNode child = parent.getFirstChild();
child != null;
child = child.getNextSibling()) {...
It looks like you are doing something like:
29 matches
Mail list logo