The only way I can see to create an intern (-like) table that discards
values no longer in use is to use Weak References...
__
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world changed profoundly and unpredictably the day Tim Ber
If the attribute is declared as an ID in the DTD, the normal operations
will find it whether or not it is named "id".
If it isn't, you get to implement your own solution...
__
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world ch
XPath's view of text nodes is that all adjacent text is a single node,
whether or not it's divided up into multiple DOM nodes internally.
__
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world changed profoundly and unpredictably t
The XML Declaration must start at the very first character of the XML
document being parsed -- if you have a blank line or space in front of it,
fix that. (The one exception is that a two-byte Byte Order Mark may preceed
the XML Declaration.)
__
Joe Kesselma
Oracle is wrong. The XML Declaration is not a child in the DOM. Complain to
them and see if they offer a mode which handles it properly.
(The DOM had no standard API for the XML Declaration until DOM Level 3,
which is part of why some parsers tried to cheat by turning it into a
special node or
This depends on how you've configured your SAX processor. SAX can return
namespace declarations as attributes (for compatability with SAX 1.0), or
as BeginPrefixMapping/EndPrefixMapping events, or both.
See http://www.saxproject.org/?selected=namespaces
__
A section is just another way of writing text content. Schemas
can validate the value of text against a datatype. They don't care whether
it was written as a CDATA section or something else.
__
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL a
On Wednesday, 01/05/2005 at 02:43 PST, Bob Foster <[EMAIL PROTECTED]> wrote:
> > If we're doing our own URI support... why?
> Why what?
"Why are we doing so". Sorry, I thought that was self-evident from context.
I agree I should have been clearer, especially since not all the
participants here
If we're using the Java URI support, I believe the jar: scheme is supported
by that layer. If we're doing our own URI support... why?
__
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world changed profoundly and unpredictably t
Does your document reference a DTD or Schema located on the web, perhaps?
__
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world changed profoundly and unpredictably the day Tim Berners Lee
got bitten by a radioactive spider."
On Tuesday, 12/14/2004 at 01:09 GMT, "Alistair Young"
> It seems to get
> populated as you do things on it, such as getNodeName/getNodeValue etc.
> For instance, after parsing, DeferredDocumentImpl.firstChild is sometimes
> null, or points to a structure full of nulls. After calling
> getNodeNa
>Because such a stream is not an XML document, I suppose.
More or less. They had to stop at some point; Document is where they drew
the line. There's nothing wrong with higher-level protocols when you have
to transmit multiple documents; if it happens often enough, someone can
propose a standa
Re why #12; can't be converted back to ^L (formfeed): Unfortunately, the
XML spec says that numeric character escapes are treated the same as their
corresponding characters; if it isn't a legal XML 1.0 character, expressing
it as a numeric is not a workaround. That's not a parser issue, it's an
>This is a major design flaw in DOM. XOM does not not have this problem.
:-)
This is a major performance tradeoff deliberately accepted in the DOM.
Checking every string every time would impose serious overhead on
applications, completely unnecessarily in many cases since the structure of
the
Not all characters are legal in XML, so not all numeric character
references are legal.
The APIs don't generally check for illegal characters. Nor does the
serializer. So it's possible for an application to build, and write out, a
document which is not well-formed.
Since you're looking at Cri
Hm. rechecking the XML 1.1 spec, #12; (which is #xC;) is a Restricted
Character...
__
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world changed profoundly and unpredictably the day Tim Berners Lee
got bitten by a radioactive
I presume you mean &12; -- the trailing semicolon is important.
See the XML Specification's description of "numeric character references".
This is absolutely standard XML. Any parser *should* accept it unless it
appears in a place where that character is not legal (in the middle of an
element
There is limited pull support already in Xerces if you work at the XNI
level. (Actually, it's closer to "burst push".)
My solution when I needed a true event-by-event pull parser was to wrap the
SAX invocations in a coroutine-based "throttle" -- run the parser and app
in different threads and
Check with the DOM WG to be sure, but I don't _think_ this is broken.
Copying/cloning/importing attribute nodes in the DOM may lose ID-ness,
since the attribute may not be an ID in the new location. Some DOMs may be
able to check and reassert this, if they have the DTD/schema information
avail
XPath is namespace-sensitive. If you don't provide prefixes in your XPath,
bound to namespaces via a PrefixResolver, it will not match namespaced
nodes.
(XPath 2.0 is planning to add support for default namespaces in XPath
syntax, but that isn't available yet.)
___
CachedXPathAPI assumes that you are *NOT* going to change the source
document(s) between path searches. If you do so, it's your responsibility
to flush the cache by discarding the CachedXPathAPI object and obtaining a
new one.
(Or you could try the experimental DOM2DTM2 adapter, which attempts
We agree that we disagree.
__
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world changed profoundly and unpredictably the day Tim Berners
Lee
got bitten by a radioactive spider." -- Rafe Culpin, in r.m.filk
We agree that we disagree. DOM applications
can and should be written to be able to operate successfully against a
DOM that does not contain namespace declaration attributes. Not all have
been, or have been updated to be so.. Then again, not all have been (re)written
to be properly namespace-aware
I'm afraid that turns out not to be
correct, Elliotte.
The DOM does not require namespace declaration
attributes to be present. They typically are produced when a parser builds
the DOM,, but the DOM is designed to operate perfectly happily without
them.In that case, it is the responsibility of t
There are some comments on DOM versus SAX at
http://www.w3.org/DOM/faq.html#SAXandDOM. It's slightly outdated, but
remains essentially true.
__
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world changed profoundly and unpredic
Namespace URIs must be absolute URI references; the use of reletive
references for this purpose has been deprecated. See
http://www.w3.org/TR/xml-names11/#iri-use
Note that the spec says "deprecated", not "forbidden". Some processors may
still tolerate relative reference syntax as namespace na
Are you sure the string contains the '?' character, rather than the tool
you're using to display that character displaying it as '?'...? The former
would be hard to explain; the latter is a fairly common problem.
__
Joe Kesselman, IBM Next-Generation Web Tec
Alternatively, you can use an internal subset (DTD syntax inside the
document itself, which keeps things self-contained)...
or you skip the mnemonic names and just use numeric character references...
or you skip those and just use the characters themselves and pick an
encoding which supports
I believe the XNI examples included a "parser" for CSV (comma-separated
value) files, which is probably an easier-to-read starting point than
NekoHTML.
__
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world changed profoundly a
XML parsers aren't generally designed to extract sub-documents as such. You
can achieve this result by running a single parse on the complete document
and routing the output in appropriate directions -- eg, by having the
parser feed a SAX processor which fans out the input to other SAX
processo
The resolver is expecting a URI, and a Windows file name is not a URI.
Use
The DOMString's internal encoding is always unicode, specifically UTF-16.
The parser should convert from the XML text's encoding to that form; the
serializer should convert to whatever output text encoding is being used.
__
Joe Kesselman, IBM Next-Generation
The > character does not really need to be escaped, since there's no
situation in which it can be misintepreted.
The ' character does not need to be escaped when it appears inside a string
delimited by ".
I suspect the effect of changing JVMs is that you're changing which version
of Xerces y
This should probably be added to the FAQ page of the website, y'know...
__
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world changed profoundly and unpredictably the day Tim Berners Lee
got bitten by a radioactive spider." --
:
was Joseph Kesselman/Watson/IBM
received
by
Most likely error: Remember that SAX does *NOT* promise that contiguous
context text will be delivered as a single call to characters(). It may
occur as several successive calls, for buffer management reasons. It's the
application's responsibiltiy to deal with gluing it back together, if
needed
The XML Declaration is officially not considered a processing instruction,
and can't be treated as one.
DOM Level 3 is planning to add calls to retrieve information from the XML
Declaration (xmEncoding, xmlStandalone, xmlVersion, inputEncoding), Xerces
probably has a beta/prototype of that fu
localName will be null if the node is a Level 1 DOM node -- ie, was
produced by a non-namespace-aware DOM builder, or by calling the
semi-deprecated createElement/createAttribute/setAttribute methods rather
than the new ...NS versions of those methods.
__
Jo
>When I receive the xml file, it currently specifies a UTF-8 encoding. Some
>characters are invalid for UTF-8, so are interpreted based on the locale
>settings of the operating system.
If you specify an encoding, your document *MUST* conform to that encoding, or it is not a well-formed XML fi
>What am I missing? Why is this URL approach a superior solution to setter
methods?
What you're missing is that this isn't just a Xerces API. It's shared with
other parsers, which may support a different subset of features. Using a
string-based approach to naming the features avoids having to
> > xsi:schemaLocation="http://www.bogus.com file://C:/xml/bogus.xsd">
file:///
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
The internal structure would be something like a DOM tree, which tends to
take more bytes than the XML stream does... and object
serialization/deserialization is often not a heck of a lot faster than XML
parsing. Most of the proposals for "binary XML" representations have hit
that same set of r
Note that the subject is misleading: The goal isn't to "prevent
DeferredTextImpl", but to suppress text nodes which a specific application
would prefer to ignore.
If you just want to suppress whitespace-in-element-content as defined by
the DTD, one of the parser features
(http://apache.org/xml
Changing the DOM while using a NodeIteratorn is legal; the DOM Level 2
Traversers And Iterators specification is very explicit about what happens
when you do that. There are some potential pitfalls, due to the iterator's
"maintain current position in the document" behavioral model; I _strongly_
See the XML Recommendation for a list of which Unicode characters are and
aren't legal in XML, and see the documentation for the encoding you're
using for information about how what's in the file translates to Unicode
and back.
Unicode can represent almost any character, in almost any language
Yes, that's normal. Think of jarfiles as if they were directories; if you
want to search them, they must be explicitly added to the classpath. (But
that's a topic for a Java tutorial...)
__
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and mo
Note that there is an outstanding suggestion that Xalan consider finally
abandoning JDK 1.1.8 support, in the hope that being allowed to use some of
the classes added in 1.2 (the new collections, for example) might yield
efficiency improvements. I don't know whether Xerces has been considering
> I want to validate that I got a Car and process the Car even though the
Car was really a >CompanyXCar(ie. I am ignoring data from other
namespaces)
The standard solution for that is to have everyone agree on a base schema
for Car and derive their customized CompanyXCars from that. Then
Comp
Are you aware that JDK 1.3 and 1.4 come with relatively ancient versions of
Xerces, and have you remembered to override that with the current code
before running your tests? It isn't just a matter of setting the classpath,
unfortunately.
(The approaches we recommend for Xalan should work for X
Whitespace *IS* text in XML; see the XML spec.
*IF* you are validating against a DTD (I'm not sure about schema), it is
possible to have whitespace that occurs in places where only elements were
expected marked as "whitespace in element content". (SAX,
erroneously-but-uncorrectably, calls thi
Just a general observation: Since it has parent links as well as sibling
and child, a DOM can be walked non-recursively. It might be worth
considering an interative implementation of the serializer and seeing how
much stack that saves us and what its performance characteristics are. See
the log
Please see http://www.w3.org/DOM/faq.html#ownerdoc
__
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world changed profoundly and unpredictably the day Tim Berners Lee
got bitten by a radioactive spider." -- Rafe Culpin, in r.m.
> Is there any thread-safe way to *read* a DOM from
> multiple threads simultaneously?
The only guaranteed-portable answer is to set up your own locks to ensure
only one thread tries to access the DOM at a time. The DOM WG looked at
this and decided that locking on individual DOM operations wa
On Tuesday, 10/28/2003 at 09:59 PST, "Jeff Greif"
<[EMAIL PROTECTED]> wrote:
> A completely portable, low-tech, arms-length solution would be to
construct
> the document, then traverse it to construct one or both of the Maps,
> Node->Id and Id->Node, assigning the Id as you go.
Unfortunately,
The most portable nonportable solution is probably to use the (DOM Level 3
prototype) userData hook to associate your own generated identifiers with
each node.
__
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world changed prof
On Monday, 10/27/2003 at 11:16 EST, Ian Lewis <[EMAIL PROTECTED]> wrote:
> I was wondering what classes currently use the DOMImplementation class in
the
> current working draft of DOM3.
I'm not sure what question you're asking... DOMImplementation isn't new
with DOM Level 3.
> My thoughts are
Try putting file:/// in front of the UNC path, to make it a legal
local-directory-system URI...
__
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world changed profoundly and unpredictably the day Tim Berners Lee
got bitten by a
>Shouldn't the transformer see the '&' as a start of an entity reference
when coming from a StringReader?
Yes, assuming you also have the trailing ; to close the entity reference,
and assuming the entity in question is defined in the DTD or internal
subset. (And that the DTD is being resolved,
If you want an entity reference, construct an Entity Reference node/event.
If you use a & character in a data field, that's just data (equivalent to
& in the XML source).
__
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world c
>Where is this stated "officially"?
In fact, I do seem to be confused. There's an explicit statement that the
binding to the Document Type can't be altered, but there isn't a similar
statement for the root element.
__
Joe Kesselman, IBM Next-Generation Web
Officially, the DOM doesn't currently allow you to replace the document
element. (DOM Level 3 may relax that restriction), which makes this a bit
more complicated if you want a fully portable solution. There may also be
children of the Document other than the root element which would need to be
On Thursday, 08/07/2003 at 06:36ZE10, "Raveendranath, Rohith (LNG - AUS)"
> ″ ′ and say one more enity —
These are numeric character references, not entity references. They don't
have to be defined (and in fact can't be defined); they map directly into
the Unicode characters with those numbe
Use DOMImplementation.createDocument to create the new Document node and
root element, use the importNode operation to clone the existing
DocumentFragment or nodes from the NodeList into a form compatable with
this new document, insert the latter into the former.
__
That's entirely correct.
The parent element was assigned a default namespace URI
t element...
:
was Joseph Kesselman/Watson/IBM
received
According to DOM Level 2, you can not actually insert or remove the root
element of a Document; it must be created at the same time as the Document
node (see the DOMImplementation.createDocument method). This is tied into
the fact that in some DOMs, both the DTD/schema and the namespaced name o
>when xerces writes this out to a file, I invariably get:
>"€"
How are you creating the node? If you write the string "€" to any of
the common XML APIs, it is presumed that you want that value to appear in
the infoset, and to achieve that the ampersand must be escaped when this is
seria
>Aren't the telecom companies sponsoring the development of Apache software
Would that they were.
__
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world changed profoundly and unpredictably the day Tim Berners Lee
got bitten b
Reminder: Xerces, like the rest of Apache, is open source software. If you
have complaints about code quality, you are more than welcome to get
involved in helping to improve it. Or you can go with a purchased copy
rather than a free copy; generally, the largest advantage of doing so is
that yo
ument
:
was Joseph Kesselman/Watson/IBM
received
>But the part ... is not XML format.
>I want to generate java class from it,
Xerces can't help you with that.
__
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world changed profoundly and unpredictably the day Tim Ber
ProcesssingInstruction nodes.
See
http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113/core.html#ID-1004215813
__
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world changed profoundly and unpredictably the day Tim Berners
>Not in DOM Level 2. Edit the tree to remove the old element, insert a new
>one, and copy the children and attributes from old to new.
(Actually, _move_ the children and attrs to the new element rather than
having to copy them. Sorry; wrote the above in a hurry.)
BTW, this is also covered in
Not in DOM Level 2. Edit the tree to remove the old element, insert a new
one, and copy the children and attributes from old to new.
DOM Level 3 has proposed adding a function to do this which *MAY* work in
*SOME* implementations. (Architected but optional feature; there are many
good reasons
Hmmm. Yes, I'd forgotten that clause. You're right, if it says "must" it
isn't optional.
Tim Bray's comment in the Annotated XML spec (http://www
.xml.com/axml/axml.html) is "If you need to put a greater-than sign in an
XML document, that's OK, unless the previous two characters happened to be
On Wednesday, 07/09/2003 at 12:46 AST, Maksym Kovalenko
<[EMAIL PROTECTED]> wrote:
> Why in this xml example start tag of 'error' element doesn't end with
'>'?
>
>
> This is the error text with the !!
There is a '>' -- right before the !!.
The real question is why "This is the
In fact, the XML grammar is such that a parser *can't* get confused about
how to interpret the '>' character. > is provided only for stylistic
reasons, because folks thought "" would express the intent more
clearly to a human reader than "" would. Unless you plan to
hand-edit your XML
To bind non-XML data to arbitrary nodes, see the prototype "user data"
hooks.
"Height" isn't an XML concept; that depends on how the XML markup is being
interpreted.
Either way, it sounds like you want to run a tree-walk, visit every node,
compute your additional data and attach it...
__
You probably intended this for the Xalan (XSLT processor) user list rather
than the Xerces (XML parser) list.
Short answer: This is an XSLT FAQ. To match a namespaced element -- whether
the source document uses the default namespace or not -- you must use a
properly-declared namespace prefix i
Easiest way to find out what version is actually on your classpath is to
invoke one of the methods on org.apache.xerces.impl.Version
__
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world changed profoundly and unpredictably th
>i knew that this is not a valid xml document. but still i wanted the
xerces to parse this data
>and accept & as it is.
Xerces is an XML parser. If your document isn't well-formed XML, you
shouldn't expect Xerces -- or any other XML tool -- to process it.
You could try putting your document t
That's a question for the Xalan list, not the Xerces list.
First question to answer, after moving over there: Did you use xsl:output
to set the output mode to HTML?
__
Joe Kesselman, IBM Next-Generation Web Technologies: XML, XSL and more.
"The world change
The W3C's Semantic Web folks are investigating what metadata, if any, is
implied by a namespace and how it should be retrieved and used. Until they
report out, the official answer is that a namespace URI is only a magic
string in URI syntax. It does not imply a specific schema, and indeed may
I don't know whether Xerces has prototyped this, but DOM Level 3 introduces
the adoptNode() method. Support for it will be optional even after DOM3 is
a recommendation, but if it can be done at all that's the semi-portable way
to do it.
If not, see the DOM FAQ re importNode() and other ownerDo
You might get more answers if you posted this to the Xalan mailing list
rather than the Xerces mailing list.
There's a path-generator template, and examples of using it, in Part 2 of
my recent DeveloperWorks article on styling stylesheets for debugging
(http://www.ibm.com/developerworks/xml/li
:
was Joseph Kesselman/Watson/IBM
received
by
>What I'm trying to do is: when the parsing is finished, I scan the tree
>and do node.removeChild(child) when child is a Text or Comment node. [...]
> Is there a way to do that properly during the parsing? a feature to set?
The proposed DOM Level 3 load/save API included a filtering mechanism,
>I can see that Elements have a setAttribute method. I have a node (Node n)
>and would like to set an attribute. Can I just cast the Node into Element,
>or is there a nicer way?
1) As others have said, don't touch setAttribute unless you really want to
create a Level 1 Only (non-namespace-tole
:
was Joseph Kesselman/Watson/IBM
received
by
"Pull" APIs sometimes make
the consumer's life considerably easier at the cost of sometimes making
the producer's life considerably more difficult. It's easier to report
good performance numbers in your component when you're in control of the
processing loop. As a result, every step in the process
On Thursday, 05/08/2003 at 02:33 MST, Christian Nelson
<[EMAIL PROTECTED]> wrote:
> Someone else mention that the include-ignorable-whitespace feature
only
> works for DTDs, not schemas.
> Now the question is, what's the equivilent feature or option for when
one
> is using a schema in place of a D
Does your document point to a DTD, so
the parser can tell what is whitespace-in-element-content and what isn't?
Note that the schema working group decided
that schemas do _not_ set this flag in the Infoset.
__
Joe Kesselman, IBM Next-Generation Web Technologie
>The resulting XSLT is then misinterpreted
by Xalan (matching a space rather
>than a CR or LF)
You still haven't shown us the relevant
portion of the document. If these characters appear in an Attribute value,
that's correct parser operation per XML's rules. See http://www.w3.org/TR/REC-xml#AVNo
> When I
serialize this document with XMLSerializer these
> carriage returns and line feeds appear in the output as exactly that
- but
> this then causes problems using the document as an XSLT
It's hard to offer specific advice without seeing
a specific example, but:
1) Remember that XML normali
Unicode is the character set. Unicode
0x0 is the character whose numeric code is 0x0 -- yes, that's NUL.
If this is only happening intermittantly,
I'd suspect there's a timing problem somewhere in the data path. The question
is where. Best suggestion I can offer is that you install a filter in th
The exception means that your document
contains a non-XML character, and hence can't be processed correctly by
an XML parser. This suggests that whatever is generating the document or
delivering it to you is broken...
__
Joe Kesselman, IBM Next-Generation Web T
This sounds like the standard SAX mistake of forgetting that text may be
delivered as multiple calls to characters() rather than just one, due to
parser buffering issues. It's the SAX application's responsibility to deal
with that in some appropriate way, most often by reassembling the data in
On Wednesday, 04/23/2003 at 10:27ZE12, Simon Kitching
<[EMAIL PROTECTED]> wrote:
> "NS" suffix is just to distinguish this method from the "old" DOM v1
> method "createAttribute" which creates an attribute that is in no
> namespace (or is it the default namespace; I can never remember..).
Actuall
>is there
a way to un-instantiate these child nodes?
Only by editing the document tree -- remove them from
their parent and let GC recover the memory they were using.
If memory use is an issue, you may want to consider
a SAX-based solution, building an in-memory model only for the portions
of
DTDs predate namespaces, and don't consider
them to be different from any other attribute. To use namespace declarations
in a DTD-validated document, the DTD must be written to allow those attributes
at the relevant points. Yes, this is a major nuisance, and it means you
must use the specific pref
1 - 100 of 262 matches
Mail list logo