[EMAIL PROTECTED] wrote:
In a message dated 2/12/2004 10:39:42 AM GMT Standard Time, [EMAIL PROTECTED] writes:


In SGML and XML, a document is composed of two sequential parts,
the prolog and the instance. You can see this in an HTML example:

1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
2 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
3 <html>
4 <head>
5 <title>The Symbol Grounding Problem</title>
6 </head>
7 8 9


In this example, the prolog is lines 1-2, the instance begins on
line 3.



Murray,

That is incorrect. The prolog includes lines 1 - 3.

No. The document instance is the <html> element, which begins on line 3. The only prolog content in my example was the DOCTYPE declaration. In a subsequent email message, Vladimir brought up the XML declaration:

Vladimir Cvetkovic (AT/LMI) wrote:
> Here is the first line of the document:
> <?xml version="1.0" encoding="UTF-8"?>

XML modifies SGML by optionally adding this to the prolog. So it
would be line 0 in my example.

The prolog includes the DOCTYPE declaration, the external
subset (called the DTD), and the internal subset (which you seldom
see but it's legal).

That too is incorrect, in my view. The prolog may optionally include a pointer to the external subset but the external subset is not part of the prolog, although it is part of the DTD.

I'm sorry, but I've been doing SGML and XML for well over a decade, have been an author/editor of a number of industry DTDs (including the modularization of XHTML and the XTM DTD), but rather than spout my CV at you, please read the relevant passages from either ISO 8879, Goldfarb, or any book on SGML. If you want to get picky about language, the declaration subset contains the DTD. That can be as an external subset (the most common, since it's an external reference to a document called a DTD) and/or an optional internal subset. But all parts of the declaration subset, including the external and internal subsets, are part of the prolog.

The document instance includes the document

element (in this case and all of its descendent content).

You generally don't want to see the prolog, and you generally don't
want to store it.

Why wouldn't you want to store the prolog? Are you referring specifically to Xindice when you make that comment? Or making a more general comment?

Why would anyone want to look at the schema for the document they're storing? They generally have access to a reference to it in the DOCTYPE, which either via a direct URI reference or via a catalog lookup points to the actual DTD. If you can see the DOCTYPE, you (as an analyst) can locate the DTD. If you can't, then the system can't either.

The DOCTYPE declaration provides references to

DTD, which is instantiated as part of the process of validating the
document. You may want to store the reference(s), but you wouldn't
want to store the DTD each time you store the document, as that
would be a real waste (the DTD is often bigger than the document).

It sounds like your well-formed and valid document isn't being
considered as such by the XML processor. The error message indicates
that there is content (i.e., either elements or character data) in
the part of the document considered as the prolog. You may be missing
the last ">" on line 2 above, as that would normally be the beginning
of the internal subset. If it found "<html" (or something similar),
you might get that error.

As I said it would be useful for us to see the prolog so we could help to identify if there is an error in Vladimir's prolog or not.

It would be useful to see the whole document.

Murray

......................................................................
Murray Altheim                    http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK               .

 "I'm a war president. I make decisions here in the Oval Office
  in foreign policy matters with war on my mind." -- George W. Bush
  http://news.bbc.co.uk/1/hi/world/americas/3470139.stm

 "This is the new Mein Kampf. Only Hitler did not have nuclear
  weapons. It's the scariest document I've ever read in my life."
        -- Dr. Helen Caldicott, referring to the Project for the
  New American Century report entitled "Rebuilding America's
  Defenses: Strategy, Forces and Resources For a New Century"
  http://home.earthlink.net/~platter/neo-conservatism/pnac.html

    "This report proceeds from the belief that America should seek
     to preserve and extend its position of global leadership by
     maintaining the preeminence of U.S. military forces." [op. cit.]

    "[...] and advanced forms of biological warfare that can target
     specific genotypes may transform biological warfare from the
     realm of terror to a politically useful tool." [op. cit.]

 "This is a blueprint for US world domination."
  http://www.guardian.co.uk/comment/story/0,3604,1036571,00.html




Reply via email to