The Solr application I'm working on has many concurrently active cores - of the order of 1000s at a time.
The management application depends on being able to query Solr for the current set of live cores, a requirement I've been satisfying using the STATUS core admin handler method. However, once the number of active cores reaches a particular threshold (which I haven't determined exactly), the response to the STATUS method is truncated, resulting in malformed XML. My debugging so far has revealed: - when doing STATUS queries from the local machine, they succeed, untruncated, >90% of the time - when local STATUS queries do fail, they are always truncated to the same length: 73685 bytes in my case - when doing STATUS queries from a remote machine, they fail due to truncation every time - remote STATUS queries are always truncated to the same length: 24704 bytes in my case - the failing STATUS queries take visibly longer to complete on the client - a few seconds for a truncated result versus <1 second for an untruncated result - all STATUS queries return a successful 200 HTTP code - all STATUS queries are logged as returning in ~700ms in Solr's info log - during failing (truncated) responses, Solr's CPU usage spikes to saturation - behaviour seems the same whatever client I use: wget, curl, Python, ... Using Solr 1.3.0 694707, Jetty 6.1.3. At the moment, the main puzzles for me are that the local and remote behaviour is so different. It leads me to think that it is something to do with the network transmission speed. But the response really isn't that big (untruncated it's ~1MB), and the CPU spike seems to suggest that something in the process of serialising the core information is taking too long and causing a timeout? Any suggestions on settings to tweak, ways to get extra debug information, or ascertain the active core list in some other way would be much appreciated! James