I am looking for a way to improve performance "bursting" or splitting large documents 
into smaller ones. 

Currently I have a pipeline that takes a large document and uses an XSLT to extract 
one part of it (i.e. a chapter). The sitemap takes the chapter id from the request URI 
and passes it to the XSLT. This is quite slow for large documents. For instance, 
consider a 5Mb document with 50 chapters or sections each of 100kb. If I crawl my 
website and access each chapter, then the pipeline will read this 5Mb document 50 
times and extract each chapter. So 250Mb of data passes through the pipeline, and a 
total of 5Mb returned (i.e. 50 x 100kb).

I'm wondering if I can improve performance by splitting the large document up to write 
chapters of it into separate files. This way I would need to traverse the document 
only once. For instance, my 5Mb document would be split (once) into 50 files, and each 
file would be returned individually. So 10Mb of data passes through the pipeline (i.e. 
5Mb while splitting the document, plus 50 x 100kb returned to the browser).

I think the SourceWritingTransformer might be used to split the documents, but I would 
also need to be able to check the last-modified dates of the original file and the 
split files so that the document could be re-split whenever it is edited.

Alternatively, the FragmentExtractorTransformer might do it. I can't find much 
documentation for this component though - and I don't really know how to use it.

Can anyone advise me about either of these approaches, or suggest any other ideas?

Cheers

Con


--
Conal Tuohy
Senior Programmer
(04)463-6844
(021)237-2498
[EMAIL PROTECTED]
New Zealand Electronic Text Centre
www.nzetc.org

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to