Hi All, We are currently running Hadoop 0.20.X version of hadoop cluster in our environment. We have been recently observing slow down of datanodes and DFSClient times out. Looking at the logs in the data nodes we noticed that there were quite a bit of Max DataXceiver exceeded exception messages of following format.
java.io.IOException: xceiverCount 4114 exceeds the limit of concurrent xcievers 4096 Our cluster configuration allows max of 4096 DataXceiver. And due to this exception our dfs clients are getting blocked slowing down DFS Performance from Client prespective. When JStack of the datanode process was checked, it showed that out of 4166 Active threads in the JVM 1336 threads were of DataXceiver. 2796 threads were PacketResponder threads. Shouldn't DataNode spawn 2760 more DataXceiver before throwing the IOException? Also looking at the code, it seems that we are not setting different thread group for BlockReceiver which causes the thread pool to be split between BlockReceiver and DataXceiver. Is this intentional? Are there are any work arounds to see to that max allocation of threads are allocated to DataXceiver? Or should I go ahead and file a JIRA regarding this issue? Sreekanth Ramakrishnan -- _____________________________________________________________ The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.