[EMAIL PROTECTED] wrote:
In a message dated 2/12/2004 10:39:42 AM GMT Standard Time,
[EMAIL PROTECTED] writes:
In SGML and XML, a document is composed of two sequential parts,
the prolog and the instance. You can see this in an HTML example:
1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
2 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3 <html>
4 <head>
5 <title>The Symbol Grounding Problem</title>
6 </head>
7
8
9
In this example, the prolog is lines 1-2, the instance begins on
line 3.
Murray,
That is incorrect. The prolog includes lines 1 - 3.
No. The document instance is the <html> element, which begins
on line 3. The only prolog content in my example was the DOCTYPE
declaration. In a subsequent email message, Vladimir brought up
the XML declaration:
Vladimir Cvetkovic (AT/LMI) wrote:
> Here is the first line of the document:
> <?xml version="1.0" encoding="UTF-8"?>
XML modifies SGML by optionally adding this to the prolog. So it
would be line 0 in my example.
The prolog includes the DOCTYPE declaration, the external
subset (called the DTD), and the internal subset (which you seldom
see but it's legal).
That too is incorrect, in my view. The prolog may optionally include a
pointer to the external subset but the external subset is not part of the prolog,
although it is part of the DTD.
I'm sorry, but I've been doing SGML and XML for well over a decade,
have been an author/editor of a number of industry DTDs (including
the modularization of XHTML and the XTM DTD), but rather than spout
my CV at you, please read the relevant passages from either ISO 8879,
Goldfarb, or any book on SGML. If you want to get picky about language,
the declaration subset contains the DTD. That can be as an external
subset (the most common, since it's an external reference to a document
called a DTD) and/or an optional internal subset. But all parts of the
declaration subset, including the external and internal subsets, are
part of the prolog.
The document instance includes the document
element (in this case and all of its descendent content).
You generally don't want to see the prolog, and you generally don't
want to store it.
Why wouldn't you want to store the prolog? Are you referring specifically to
Xindice when you make that comment? Or making a more general comment?
Why would anyone want to look at the schema for the document they're
storing? They generally have access to a reference to it in the
DOCTYPE, which either via a direct URI reference or via a catalog
lookup points to the actual DTD. If you can see the DOCTYPE, you
(as an analyst) can locate the DTD. If you can't, then the system
can't either.
The DOCTYPE declaration provides references to
DTD, which is instantiated as part of the process of validating the
document. You may want to store the reference(s), but you wouldn't
want to store the DTD each time you store the document, as that
would be a real waste (the DTD is often bigger than the document).
It sounds like your well-formed and valid document isn't being
considered as such by the XML processor. The error message indicates
that there is content (i.e., either elements or character data) in
the part of the document considered as the prolog. You may be missing
the last ">" on line 2 above, as that would normally be the beginning
of the internal subset. If it found "<html" (or something similar),
you might get that error.
As I said it would be useful for us to see the prolog so we could help to
identify if there is an error in Vladimir's prolog or not.
It would be useful to see the whole document.
Murray
......................................................................
Murray Altheim http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK .
"I'm a war president. I make decisions here in the Oval Office
in foreign policy matters with war on my mind." -- George W. Bush
http://news.bbc.co.uk/1/hi/world/americas/3470139.stm
"This is the new Mein Kampf. Only Hitler did not have nuclear
weapons. It's the scariest document I've ever read in my life."
-- Dr. Helen Caldicott, referring to the Project for the
New American Century report entitled "Rebuilding America's
Defenses: Strategy, Forces and Resources For a New Century"
http://home.earthlink.net/~platter/neo-conservatism/pnac.html
"This report proceeds from the belief that America should seek
to preserve and extend its position of global leadership by
maintaining the preeminence of U.S. military forces." [op. cit.]
"[...] and advanced forms of biological warfare that can target
specific genotypes may transform biological warfare from the
realm of terror to a politically useful tool." [op. cit.]
"This is a blueprint for US world domination."
http://www.guardian.co.uk/comment/story/0,3604,1036571,00.html