Tom Bradford wrote: > > > Large Documents and Document Versioning > ------------------------------------------------------------ > Xindice needs to be capable of supporting massive documents in a > scalable fashion and with acceptable performance. Currently, the > document representation architecture is based on a tokenized, lazy DOM > where the bytestream images that feed the DOM are stored and retrieved > in a paged filing system. Every document is treated as an atomic unit. > This has some serious limitations when it comes to massive documents. > > In order to support very large documents, the tokenization system needs > to be replaced and geared more toward the simplified representation of > document structure rather than an equal balance of structure and > content. Also, the Filer interfaces need to support the notion of > streaming, and even more importantly, the ability to support random > access streaming.
As I have mentioned in a private discussion earlier, I am working on compressed in-memory representations of XML data, especially large, data-centric documents. One of the side effects of my current design is a split of content and structure, but it is quite immature so far. I was thinking that integrating it with xindice would be a nice thing, but currently i have not the time or skills required to do it. However, I'm happy to hear about any ideas and would contribute (parts of) my design if that's any help. Mathias
