Tom Bradford wrote:
>
>
> Large Documents and Document Versioning
> ------------------------------------------------------------
> Xindice needs to be capable of supporting massive documents in a
> scalable fashion and with acceptable performance.  Currently, the
> document representation architecture is based on a tokenized, lazy DOM
> where the bytestream images that feed the DOM are stored and retrieved
> in a paged filing system.  Every document is treated as an atomic unit.
> This has some serious limitations when it comes to massive documents.
>
> In order to support very large documents, the tokenization system needs
> to be replaced and geared more toward the simplified representation of
> document structure rather than an equal balance of structure and
> content.  Also, the Filer interfaces need to support the notion of
> streaming, and even more importantly, the ability to support random
> access streaming.

As I have mentioned in a private discussion earlier, I am working on
compressed in-memory representations of XML data, especially large,
data-centric documents. One of the side effects of my current design is
a split of content and structure, but it is quite immature so far. I was
thinking that integrating it with xindice would be a nice thing, but
currently i have not the time or skills required to do it. However, I'm
happy to hear about any ideas and would contribute (parts of) my design
if that's any help.

Mathias

Reply via email to