Chris, Chuck and others,

many thanks for taking the time to educate me (on both "Tomcat threads" threads). I got lots of information and tips, which will be useful now or later. I'll now go sift through them again. At least now I have an idea where to start.

About the fact that my hardware sucks, for Java : I know.
On the other hand, that machine is a good filter against programs' and programmer's hubris, particularly Java ones. If it runs there, it will run anywhere kind of thing, and I don't need 50 fake clients to stress it out (obviously).

On the other hand again, on the same machine I have a text search and retrieval application that can sift through a full-text index of 100,000 documents (1 Gb of text) and retrieve the ones I want in couple of seconds. It has a 10 Mb memory footprint. That's why the 500 Mb footprint of Tomcat (with the app) and the 5 minute delay in starting the app over 25 Mb of XML so struck me.

I also have learned (separately, and confirmed here several times) that XML parsing is a hog, and that is not only in Java. Particularly the DOM-style of parsing exhibits exponential time behaviour in relation to document size. Large text fields are absolute killers, and making them CDATA only partly alleviates that.

One can always throw more hardware at things, and sometimes it may be cheaper than trying to over-optimise. But some applications out there will kill any hardware. It is sometimes surprisingly easy to gain a factor 2 with little investment though, and if that means halving the number of servers and their attendant care and paraphernalia, it's still worth it for us. Even when they are virtual.

Our main business is processing documents, text-intensive, that's why I am interested. Gaining 5 seconds in processing a document counts, if you're processing thousands per day. For a user with his finger on the mouse button too, there is a lot of difference between 1 and 3 seconds. A note here of one of you regarding substrings in Java has particularly caught my interest. And I'll go check if the XML parser in that application could be replaced by a newer version maybe. One alternative to XML in feeding that application with data is CSV files (the text version of spreadsheet). I had discarded it until now as old-fashioned, "passé", limited etc.. XML is so much more "in". But I am having second thoughts now, and I will give it a try.

When I started in this business, 64 Kb was a nice quantity of memory to program in, and quite expensive too. I created and ran a payroll application for a 1,000 people company in there. This Java app looks a lot cuter than the payroll did, but 500 Megabyte of memory for one single Tomcat app, mmm. Some reflexes remain for a lifetime.

Thanks.

---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to