Thanks Ted, > To answer your last question first, no you don't have to do anything > explicit to keep the ZK connection alive. It is maintained by a dedicated > thread. You do have to keep your java program responsive and ZK problems > like this almost always indicate that you have a problem with your program > checking out for extended periods of time. > > My strong guess is that you have something evil happening with your java > process that is actually causing this delay. > > Since you have tiny memory, it probably isn't GC. Since you have a bunch of > processes, swap and process wakeup delays seem plausible. What is the load > average on your box?
CPU spikes when responses come in, but mostly it's IO wait on the endpoints (timeout of 3 minutes). I suspect HTTP client 4 is dropping into a retry mechanism though, but have not investigated this yet. > On the topic of your application, why you are using processes instead of > threads? With threads, you can get your memory overhead down to 10's of > kilobytes as opposed to 10's of megabytes. I am just prototyping scaling out many processes and potentially across multiple machines. Our live crawler runs in a single JVM, but some of these crawlers take 4-6 weeks, so long running processes block others, so I was looking at alternatives - our live crawler also uses DOM based XML parsing so hitting memory limits - SAX would address this. Also we want to be able to deploy patches to the crawlers without interrupting those long running jobs if possible. > Also, why not use something like Bixo so you don't have to prototype a > threaded crawler? It is not a web crawler but more of a custom web service client that issues queries for pages of data. A second query is assembled based on the response of the first. These are Biodiversity domain specific protocols DiGIR, TAPIR and BioCASe which are closer to SOAP based requests / response. I'll look at Bixo. Thanks again, Tim > > On Tue, Sep 21, 2010 at 8:24 AM, Tim Robertson > <timrobertson...@gmail.com>wrote: > >> Hi all, >> >> I am seeing a lot of my clients being kicked out after the 10 minute >> negotiated timeout is exceeded. >> My clients are each a JVM (around 100 running on a machine) which are >> doing web crawling of specific endpoints and handling the response XML >> - so they do wait around for 3-4 minutes on HTTP timeouts, but >> certainly not 10 mins. >> I am just prototyping right now on a 2xquad core mac pro with 12GB >> memory, and the 100 child processes only get -Xmx64m and I don't see >> my machine exhausted. >> >> Do my clients need to do anything in order to initiate keep alive >> heart beats or should this be automatic (I thought the ticktime would >> dictate this)? >> >> # my conf is: >> tickTime=2000 >> dataDir=/Volumes/Data/zookeeper >> clientPort=2181 >> maxClientCnxns=10000 >> minSessionTimeout=4000 >> maxSessionTimeout=800000 >> >> Thanks for any pointers to this newbie, >> Tim >> >