jbates      2002/12/03 07:26:07

  Modified:    src/documentation/content/xdocs/dev guide-internals.xml
               src/documentation/resources/images element.png element.xcf
  Log:
  More compressed DOM documentation
  
  Revision  Changes    Path
  1.4       +87 -7     
xml-xindice/src/documentation/content/xdocs/dev/guide-internals.xml
  
  Index: guide-internals.xml
  ===================================================================
  RCS file: 
/home/cvs/xml-xindice/src/documentation/content/xdocs/dev/guide-internals.xml,v
  retrieving revision 1.3
  retrieving revision 1.4
  diff -u -r1.3 -r1.4
  --- guide-internals.xml       3 Dec 2002 14:21:20 -0000       1.3
  +++ guide-internals.xml       3 Dec 2002 15:26:07 -0000       1.4
  @@ -207,7 +207,8 @@
   </collection>
   ]]></source>
                   <p>The only way to modify this configuration is to change
  -                   the Xindice source code and recompile.</p>
  +                   the Xindice source code 
(<code>org.apache.xindice.core.SystemCollection</code>
  +                   class)and recompile.</p>
               </section>
               <section>
                   <title>2.4. Other Collections</title>
  @@ -426,7 +427,8 @@
                      <code>system_SysSymbols</code> in the 
<code>system/SysSymbols</code>
                      collection. Doing so however would create an endless 
loop, as
                      <code>system/SysSymbols</code>'s symbol table is needed 
to read itself!
  -                   This particular symbol table is therefore hardcoded into 
the Xindice
  +                   This particular symbol table is therefore hardcoded
  +                   (<code>org.apache.xindice.core.SystemCollection</code> 
class) into the Xindice
                      source code.</p>
                   <p>For any other collection, you can always request the 
symbol table
                      yourself by issuing the Xindice command-line 
invocation:</p>
  @@ -441,14 +443,92 @@
                      representation of the XML. This will contain the byte 
data for the children
                      of the node, and these sub-sequences contain the data for 
their children etc...</p>
                   <p>Xindice thus starts by generating the byte sequence for 
the document node, which
  -                   will set off generation for the whole XML document.</p>
  +                   will set off generation for the whole XML document. The 
code that handles this
  +                   is located in the 
<code>org.apache.xindice.xml.dom.DOMCompressor</code> class.</p>
                   <section>
  -                    <title>4.2.1. Element nodes</title>
  +                    <title>4.2.1. Document node</title>
  +                </section>
  +                <section>
  +                    <title>4.2.2. Element nodes</title>
                       <p>An element node is encoded as shown in the diagram 
below:</p>
                       <figure src="images/element.png" alt="Element compressed 
DOM format"/>
  -                </section>
  -
  +                    <p>All multibyte fields are always encoded in Big Endian 
order, i.e. the
  +                       most significant byte is at the lowest address.
  +                       The meaning of the various fields is as follows:</p>
  +                    <ul>
  +                        <li>The signature (1 byte) is composed of several 1- 
or 2-bit fields whose
  +                            meaning is:
  +                            <ul>
  +                                <li>
  +                                    <code>Attribute Count Type</code> (bits 
0-1). This is a code
  +                                    that indicates the length of the 
<code>attribute_count</code> field. The
  +                                    code works as follows:
  +                                    <ul>
  +                                        <li>value <code>00</code> (binary): 
the <code>attribute_count</code> field is zero, and
  +                                            thus <em>absent</em> from the 
byte array</li>
  +                                        <li>value <code>01</code> (binary): 
the <code>attribute_count</code> field is
  +                                            4 bytes (32 bits) long.</li>
  +                                        <li>value <code>10</code> (binary): 
the <code>attribute_count</code> field is
  +                                            2 bytes (16 bits) long.</li>
  +                                        <li>value <code>11</code> (binary): 
the <code>attribute_count</code> field is
  +                                            1 byte (8 bits) long.</li>
  +                                    </ul>
  +                                </li>
  +                                <li>
  +                                    <code>Record Length Type</code> (bits 
2-3). This is a code
  +                                    that indicates the length of the 
<code>record_length</code> field. The
  +                                    code works exactly as the 
<code>Attribute Count Type</code> code above, except
  +                                    it cannot be <code>00</code>, as the 
<code>record_length</code> field is never zero.
  +                                </li>
  +                                <li>
  +                                    <code>C</code> (bit 4). Tells whether 
the element has any non-attribute children
  +                                    (child elements, text or comments). 
<code>0</code> means it hasn't, <code>1</code>
  +                                    means it has.
  +                                </li>
  +                                <li>
  +                                    <code>A</code> (bit 5). Tells whether 
the element has any attributes.
  +                                    <code>0</code> means it hasn't, 
<code>1</code> means it has.
  +                                </li>
  +                                <li>
  +                                    signature type (bits 6-7). Always set to 
<code>01</code> for elements.
  +                                    It tells Xindice that what follows is an 
element signature.
  +                                </li>
  +                            </ul>
  +                        </li>
  +                        <li><code>record_length</code> (<code>x</code>: 
number of
  +                            bytes inidicated by <code>RLT</code> code in 
signature): the length, in bytes, of the byte string
  +                            representing this signature. That is including 
the signature byte, this <code>record_length</code>
  +                            field, the <code>symbol_id</code> field and any 
child &amp; attribute data.
  +                        </li>
  +                        <li><code>symbol_id</code> (2 bytes): This is the 
16-bit identifier that is associated with the
  +                            element <em>name</em> of this element in the 
containing collection's symbol table.
  +                        </li>
  +                        <li><code>attribute_count</code> (<code>y</code>: 
number of
  +                            bytes indicated by <code>ACT</code> code in 
signature): the number of attributes
  +                            this element has. This allows Xindice to start 
reading the child &amp; attribute data, knowing
  +                            that the first <code>attribute_count</code> 
nodes will represent attributes.
  +                        </li>
  +                        <li>child &amp; attribute data: Attribute data is 
written first; it is obtained by generating
  +                            byte arrays for all attribute nodes in this 
element and concatenating them together. Immediately
  +                            following the attribute data come the byte 
sequences obtained by compressing the comment, text
  +                            and child element nodes of this element. They 
are processed in the order they appear in the XML
  +                            document. Again, the byte sequences thus 
generated by child nodes are concenated together.
  +                        </li>
  +                    </ul>
   
  +                </section>
  +                <section>
  +                    <title>4.2.3. Attribute nodes</title>
  +                </section>
  +                <section>
  +                    <title>4.2.4. Text nodes</title>
  +                </section>
  +                <section>
  +                    <title>4.2.5. Comment nodes</title>
  +                </section>
  +                <section>
  +                    <title>4.2.6. Processing Instruction nodes</title>
  +                </section>
               </section>
           </section>
           <section>
  
  
  
  1.2       +10 -16    
xml-xindice/src/documentation/resources/images/element.png
  
        <<Binary file>>
  
  
  1.2       +9 -13     
xml-xindice/src/documentation/resources/images/element.xcf
  
        <<Binary file>>
  
  

Reply via email to