jbates 2002/12/03 07:26:07
Modified: src/documentation/content/xdocs/dev guide-internals.xml
src/documentation/resources/images element.png element.xcf
Log:
More compressed DOM documentation
Revision Changes Path
1.4 +87 -7
xml-xindice/src/documentation/content/xdocs/dev/guide-internals.xml
Index: guide-internals.xml
===================================================================
RCS file:
/home/cvs/xml-xindice/src/documentation/content/xdocs/dev/guide-internals.xml,v
retrieving revision 1.3
retrieving revision 1.4
diff -u -r1.3 -r1.4
--- guide-internals.xml 3 Dec 2002 14:21:20 -0000 1.3
+++ guide-internals.xml 3 Dec 2002 15:26:07 -0000 1.4
@@ -207,7 +207,8 @@
</collection>
]]></source>
<p>The only way to modify this configuration is to change
- the Xindice source code and recompile.</p>
+ the Xindice source code
(<code>org.apache.xindice.core.SystemCollection</code>
+ class)and recompile.</p>
</section>
<section>
<title>2.4. Other Collections</title>
@@ -426,7 +427,8 @@
<code>system_SysSymbols</code> in the
<code>system/SysSymbols</code>
collection. Doing so however would create an endless
loop, as
<code>system/SysSymbols</code>'s symbol table is needed
to read itself!
- This particular symbol table is therefore hardcoded into
the Xindice
+ This particular symbol table is therefore hardcoded
+ (<code>org.apache.xindice.core.SystemCollection</code>
class) into the Xindice
source code.</p>
<p>For any other collection, you can always request the
symbol table
yourself by issuing the Xindice command-line
invocation:</p>
@@ -441,14 +443,92 @@
representation of the XML. This will contain the byte
data for the children
of the node, and these sub-sequences contain the data for
their children etc...</p>
<p>Xindice thus starts by generating the byte sequence for
the document node, which
- will set off generation for the whole XML document.</p>
+ will set off generation for the whole XML document. The
code that handles this
+ is located in the
<code>org.apache.xindice.xml.dom.DOMCompressor</code> class.</p>
<section>
- <title>4.2.1. Element nodes</title>
+ <title>4.2.1. Document node</title>
+ </section>
+ <section>
+ <title>4.2.2. Element nodes</title>
<p>An element node is encoded as shown in the diagram
below:</p>
<figure src="images/element.png" alt="Element compressed
DOM format"/>
- </section>
-
+ <p>All multibyte fields are always encoded in Big Endian
order, i.e. the
+ most significant byte is at the lowest address.
+ The meaning of the various fields is as follows:</p>
+ <ul>
+ <li>The signature (1 byte) is composed of several 1-
or 2-bit fields whose
+ meaning is:
+ <ul>
+ <li>
+ <code>Attribute Count Type</code> (bits
0-1). This is a code
+ that indicates the length of the
<code>attribute_count</code> field. The
+ code works as follows:
+ <ul>
+ <li>value <code>00</code> (binary):
the <code>attribute_count</code> field is zero, and
+ thus <em>absent</em> from the
byte array</li>
+ <li>value <code>01</code> (binary):
the <code>attribute_count</code> field is
+ 4 bytes (32 bits) long.</li>
+ <li>value <code>10</code> (binary):
the <code>attribute_count</code> field is
+ 2 bytes (16 bits) long.</li>
+ <li>value <code>11</code> (binary):
the <code>attribute_count</code> field is
+ 1 byte (8 bits) long.</li>
+ </ul>
+ </li>
+ <li>
+ <code>Record Length Type</code> (bits
2-3). This is a code
+ that indicates the length of the
<code>record_length</code> field. The
+ code works exactly as the
<code>Attribute Count Type</code> code above, except
+ it cannot be <code>00</code>, as the
<code>record_length</code> field is never zero.
+ </li>
+ <li>
+ <code>C</code> (bit 4). Tells whether
the element has any non-attribute children
+ (child elements, text or comments).
<code>0</code> means it hasn't, <code>1</code>
+ means it has.
+ </li>
+ <li>
+ <code>A</code> (bit 5). Tells whether
the element has any attributes.
+ <code>0</code> means it hasn't,
<code>1</code> means it has.
+ </li>
+ <li>
+ signature type (bits 6-7). Always set to
<code>01</code> for elements.
+ It tells Xindice that what follows is an
element signature.
+ </li>
+ </ul>
+ </li>
+ <li><code>record_length</code> (<code>x</code>:
number of
+ bytes inidicated by <code>RLT</code> code in
signature): the length, in bytes, of the byte string
+ representing this signature. That is including
the signature byte, this <code>record_length</code>
+ field, the <code>symbol_id</code> field and any
child & attribute data.
+ </li>
+ <li><code>symbol_id</code> (2 bytes): This is the
16-bit identifier that is associated with the
+ element <em>name</em> of this element in the
containing collection's symbol table.
+ </li>
+ <li><code>attribute_count</code> (<code>y</code>:
number of
+ bytes indicated by <code>ACT</code> code in
signature): the number of attributes
+ this element has. This allows Xindice to start
reading the child & attribute data, knowing
+ that the first <code>attribute_count</code>
nodes will represent attributes.
+ </li>
+ <li>child & attribute data: Attribute data is
written first; it is obtained by generating
+ byte arrays for all attribute nodes in this
element and concatenating them together. Immediately
+ following the attribute data come the byte
sequences obtained by compressing the comment, text
+ and child element nodes of this element. They
are processed in the order they appear in the XML
+ document. Again, the byte sequences thus
generated by child nodes are concenated together.
+ </li>
+ </ul>
+ </section>
+ <section>
+ <title>4.2.3. Attribute nodes</title>
+ </section>
+ <section>
+ <title>4.2.4. Text nodes</title>
+ </section>
+ <section>
+ <title>4.2.5. Comment nodes</title>
+ </section>
+ <section>
+ <title>4.2.6. Processing Instruction nodes</title>
+ </section>
</section>
</section>
<section>
1.2 +10 -16
xml-xindice/src/documentation/resources/images/element.png
<<Binary file>>
1.2 +9 -13
xml-xindice/src/documentation/resources/images/element.xcf
<<Binary file>>