I am looking for a way to improve performance "bursting" or splitting large documents into smaller ones.
Currently I have a pipeline that takes a large document and uses an XSLT to extract one part of it (i.e. a chapter). The sitemap takes the chapter id from the request URI and passes it to the XSLT. This is quite slow for large documents. For instance, consider a 5Mb document with 50 chapters or sections each of 100kb. If I crawl my website and access each chapter, then the pipeline will read this 5Mb document 50 times and extract each chapter. So 250Mb of data passes through the pipeline, and a total of 5Mb returned (i.e. 50 x 100kb). I'm wondering if I can improve performance by splitting the large document up to write chapters of it into separate files. This way I would need to traverse the document only once. For instance, my 5Mb document would be split (once) into 50 files, and each file would be returned individually. So 10Mb of data passes through the pipeline (i.e. 5Mb while splitting the document, plus 50 x 100kb returned to the browser). I think the SourceWritingTransformer might be used to split the documents, but I would also need to be able to check the last-modified dates of the original file and the split files so that the document could be re-split whenever it is edited. Alternatively, the FragmentExtractorTransformer might do it. I can't find much documentation for this component though - and I don't really know how to use it. Can anyone advise me about either of these approaches, or suggest any other ideas? Cheers Con -- Conal Tuohy Senior Programmer (04)463-6844 (021)237-2498 [EMAIL PROTECTED] New Zealand Electronic Text Centre www.nzetc.org --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]