Thanks. That is helpful.

-----Original Message-----
From: Tom Bradford [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 06, 2002 7:01 PM
To: [email protected]
Subject: Re: indexing/xpath query question


On Wednesday, March 6, 2002, at 04:49 PM, Mark J. Stang wrote:
> Sounds like a good sample to me.   The collection could be smaller if 
> you
> document is mostly tags.   I would guess that the internal storage is 
> not
> just your raw document, but a parsed version and the tags are probably
> represented by a number.   If the ratio of your data to your
> tags goes way up, then you will probably see a difference.   I don't 
> know
> this for fact, I don't actually code Xindice, I just play a coder on 
> television.

This is pretty much the case.  Xindice doesn't store things as a 
serialized DOM.  It creates a tokenized stream, and stores all element 
and attribute names in a single, global collection that maps those names 
to integer symbol IDs.  The symbol IDs are what actually get stored in 
the collection and index files, so if the XML is very data oriented and 
has a lot of tags and attributes, the removal of those names can reduce 
the size of the disk image rather well.

--
Tom Bradford - http://www.tbradford.org
Architect - XQRL (XQuery Engine) - http://www.xqrl.com
Apache Xindice (Native XML Database) - http://xml.apache.org/xindice
Project Labrador (Web Services Framework) - http://notdotnet.org

Reply via email to