http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/84cfbdfc/geode-docs/managing/monitor_tune/slow_receivers_preventing_problems.html.md.erb ---------------------------------------------------------------------- diff --git a/geode-docs/managing/monitor_tune/slow_receivers_preventing_problems.html.md.erb b/geode-docs/managing/monitor_tune/slow_receivers_preventing_problems.html.md.erb deleted file mode 100644 index ec0c199..0000000 --- a/geode-docs/managing/monitor_tune/slow_receivers_preventing_problems.html.md.erb +++ /dev/null @@ -1,45 +0,0 @@ ---- -title: Preventing Slow Receivers ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to You under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - -During system integration, you can identify and eliminate potential causes of slow receivers in peer-to-peer communication. - -Work with your network administrator to eliminate any problems you identify. - -Slowing is more likely to occur when applications run many threads, send large messages (due to large entry values), or have a mix of region configurations. The problem can also arise from message delivery retries caused by intermittent connection problems. - -**Host Resources** - -Make sure that the machines that run Geode members have enough CPU available to them. Do not run any other heavyweight processes on the same machine. - -The machines that host Geode application and cache server processes should have comparable computing power and memory capacity. Otherwise, members on the less powerful machines tend to have trouble keeping up with the rest of the group. - -**Network Capacity** - -Eliminate congested areas on the network by rebalancing the traffic load. Work with your network administrator to identify and eliminate traffic bottlenecks, whether caused by the architecture of the distributed Geode system or by contention between the Geode traffic and other traffic on your network. Consider whether more subnets are needed to separate the Geode administrative traffic from Geode data transport and to separate all the Geode traffic from the rest of your network load. - -The network connections between hosts need to have equal bandwidth. If not, you can end up with a configuration like the multicast example in the following figure, which creates conflicts among the members. For example, if app1 sends out data at 7Mbps, app3 and app4 would be fine, but app2 would miss some data. In that case, app2 contacts app1 on the TCP channel and sends a log message that itâs dropping data. -<img src="../../images_svg/unbalanced_network_capacity_probs.svg" id="slow_recv__image_F8C424AB97C444298993294000676150" class="image" /> - -**Plan for Growth** - -Upgrade the infrastructure to the level required for acceptable performance. Analyze the expected Geode traffic in comparison to the networkâs capacity. Build in extra capacity for growth and high-traffic spikes. Similarly, evaluate whether the machines that host Geode application and cache server processes can handle the expected load. - -
http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/84cfbdfc/geode-docs/managing/monitor_tune/socket_communication.html.md.erb ---------------------------------------------------------------------- diff --git a/geode-docs/managing/monitor_tune/socket_communication.html.md.erb b/geode-docs/managing/monitor_tune/socket_communication.html.md.erb deleted file mode 100644 index a97986a..0000000 --- a/geode-docs/managing/monitor_tune/socket_communication.html.md.erb +++ /dev/null @@ -1,48 +0,0 @@ ---- -title: Socket Communication ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to You under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - -Geode processes communicate using TCP/IP and UDP unicast and multicast protocols. In all cases, communication uses sockets that you can tune to optimize performance. - -The adjustments you make to tune your Geode communication may run up against operating system limits. If this happens, check with your system administrator about adjusting the operating system settings. - -All of the settings discussed here are listed as `gemfire.properties` and `cache.xml` settings. They can also be configured through the API and some can be configured at the command line. Before you begin, you should understand Geode [Basic Configuration and Programming](../../basic_config/book_intro.html). - -- **[Setting Socket Buffer Sizes](../../managing/monitor_tune/socket_communication_setting_socket_buffer_sizes.html)** - - When you determine buffer size settings, you try to strike a balance between communication needs and other processing. - -- **[Ephemeral TCP Port Limits](../../managing/monitor_tune/socket_communication_ephemeral_tcp_port_limits.html)** - - By default, Windowsâ ephemeral ports are within the range 1024-4999, inclusive.You can increase the range. - -- **[Making Sure You Have Enough Sockets](../../managing/monitor_tune/socket_communication_have_enough_sockets.html)** - - The number of sockets available to your applications is governed by operating system limits. - -- **[TCP/IP KeepAlive Configuration](../../managing/monitor_tune/socket_tcp_keepalive.html)** - - Geode supports TCP KeepAlive to prevent socket connections from being timed out. - -- **[TCP/IP Peer-to-Peer Handshake Timeouts](../../managing/monitor_tune/socket_communication_tcpip_p2p_handshake_timeouts.html)** - - You can alleviate connection handshake timeouts for TCP/IP connections by increasing the connection handshake timeout interval with the system property p2p.handshakeTimeoutMs. - - http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/84cfbdfc/geode-docs/managing/monitor_tune/socket_communication_ephemeral_tcp_port_limits.html.md.erb ---------------------------------------------------------------------- diff --git a/geode-docs/managing/monitor_tune/socket_communication_ephemeral_tcp_port_limits.html.md.erb b/geode-docs/managing/monitor_tune/socket_communication_ephemeral_tcp_port_limits.html.md.erb deleted file mode 100644 index 3df570a..0000000 --- a/geode-docs/managing/monitor_tune/socket_communication_ephemeral_tcp_port_limits.html.md.erb +++ /dev/null @@ -1,58 +0,0 @@ ---- -title: Ephemeral TCP Port Limits ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to You under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - -By default, Windowsâ ephemeral ports are within the range 1024-4999, inclusive.You can increase the range. - -<a id="socket_comm__section_F535D5D99206498DBBD5A6CC3230F25B"></a> -If you are repeatedly receiving the following exception: - -``` pre -java.net.BindException: Address already in use: connect -``` - -and if your system is experiencing a high degree of network activity, such as numerous short-lived client connections, this could be related to a limit on the number of ephemeral TCP ports. While this issue could occur with other operating systems, typically, it is only seen with Windows due to a low default limit. - -Perform this procedure to increase the limit: - -1. Open the Windows Registry Editor. -2. Navigate to the following key: - - ``` pre - HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameter - ``` - -3. From the Edit menu, click New, and then add the following registry entry: - - ``` pre - Value Name: MaxUserPort - Value Type: DWORD - Value data: 36863 - ``` - -4. Exit the Registry Editor, and then restart the computer. - -This affects all versions of the Windows operating system. - -**Note for UDP on Unix Systems** - -Unix systems have a default maximum socket buffer size for receiving UDP multicast and unicast transmissions that is lower than the default settings for mcast-recv-buffer-size and udp-recv-buffer-size. To achieve high-volume multicast messaging, you should increase the maximum Unix buffer size to at least one megabyte. - - http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/84cfbdfc/geode-docs/managing/monitor_tune/socket_communication_have_enough_sockets.html.md.erb ---------------------------------------------------------------------- diff --git a/geode-docs/managing/monitor_tune/socket_communication_have_enough_sockets.html.md.erb b/geode-docs/managing/monitor_tune/socket_communication_have_enough_sockets.html.md.erb deleted file mode 100644 index a075e08..0000000 --- a/geode-docs/managing/monitor_tune/socket_communication_have_enough_sockets.html.md.erb +++ /dev/null @@ -1,185 +0,0 @@ ---- -title: Making Sure You Have Enough Sockets ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to You under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - -The number of sockets available to your applications is governed by operating system limits. - -Sockets use file descriptors and the operating systemâs view of your applicationâs socket use is expressed in terms of file descriptors. There are two limits, one on the maximum descriptors available to a single application and the other on the total number of descriptors available in the system. If you get error messages telling you that you have too many files open, you might be hitting the operating system limits with your use of sockets. Your system administrator might be able to increase the system limits so that you have more available. You can also tune your members to use fewer sockets for their outgoing connections. This section discusses socket use in Geode and ways to limit socket consumption in your Geode members. - -## <a id="socket_comm__section_31B4EFAD6F384AB1BEBCF148D3DEA514" class="no-quick-link"></a>Socket Sharing - -You can configure socket sharing for peer-to-peer and client-to-server connections: - -- **Peer-to-peer**. You can configure whether your members share sockets both at the application level and at the thread level. To enable sharing at the application level, set the `gemfire.properties` `conserve-sockets` to true. To achieve maximum throughput, however, we recommend that you set `conserve-sockets` to `false`. - - At the thread level, developers can override this setting by using the DistributedSystem API method `setThreadsSocketPolicy`. You might want to enable socket sharing at the application level and then have threads that do a lot of cache work take sole ownership of their sockets. Make sure to program these threads to release their sockets as soon as possible using the `releaseThreadsSockets` method, rather than waiting for a timeout or thread death. - -- **Client**. You can configure whether your clients share their socket connections to servers with the pool setting `thread-local-connections`. There is no thread override for this setting. All threads either have their own socket or they all share. - -## <a id="socket_comm__section_6189D4E5E14F47E7882354603FBCE471" class="no-quick-link"></a>Socket Lease Time - -You can force the release of an idle socket connection for peer-to-peer and client-to-server connections: - -- **Peer-to-peer**. For peer-to-peer threads that do not share sockets, you can use the `socket-lease-time` to make sure that no socket sits idle for too long. When a socket that belongs to an individual thread remains unused for this time period, the system automatically returns it to the pool. The next time the thread needs a socket, it creates a new socket. -- **Client**. For client connections, you can affect the same lease-time behavior by setting the pool `idle-timeout`. - -## <a id="socket_comm__section_936C6562C0034A2EAC9A63FFE9FDAC36" class="no-quick-link"></a>Calculating Connection Requirements - -Each type of member has its own connection requirements. Clients need connections to their servers, peers need connections to peers, and so on. Many members have compound roles. Use these guidelines to figure each memberâs socket needs and to calculate the combined needs of members that run on a single host system. - -A memberâs socket use is governed by a number of factors, including: - -- How many peer members it connects to -- How many threads it has that update the cache and whether the threads share sockets -- Whether it is a server or a client, -- How many connections come in from other processes - -The socket requirements described here are worst-case. Generally, it is not practical to calculate exact socket use for your applications. Socket use varies depending a number of factors including how many members are running, what their threads are doing, and whether threads share sockets. - -To calculate any memberâs socket requirements, add up the requirements for every category that applies to the member. For example, a cache server running in a distributed system with clients connected to it has both peer-to-peer and server socket requirements. - -## <a id="socket_comm__section_DF64BDE7B6AA47A9B08E0540CAD6DA3A" class="no-quick-link"></a>Peer-to-Peer Socket Requirements Per Member - -Every member of a distributed system maintains two outgoing and two incoming connections to every peer. If threads share sockets, these fixed sockets are the sockets they share. - -For every thread that does not share sockets, additional sockets, one in and one out, are added for each peer. This affects not only the memberâs socket count, but the socket count for every member the member thread connects to. - -In this table: - -- M is the total number of members in the distributed system. -- T is the number of threads in a member that own their own sockets and do not share. - -<table> -<colgroup> -<col width="50%" /> -<col width="50%" /> -</colgroup> -<thead> -<tr class="header"> -<th>Peer Member Socket Description</th> -<th>Number Used</th> -</tr> -</thead> -<tbody> -<tr class="odd"> -<td><p>Membership failure detection</p></td> -<td>2</td> -</tr> -<tr class="even"> -<td><p>Listener for incoming peer connections (server P2P)</p></td> -<td><p>1</p></td> -</tr> -<tr class="odd"> -<td><p>Shared sockets (2 in and 2 out)</p> -<p>Threads that share sockets use these.</p></td> -<td><p>4 * (M-1)</p></td> -</tr> -<tr class="even"> -<td>This memberâs thread-owned sockets (1 in and 1 out for each thread, for each peer member).</td> -<td><p>(T * 2) * (M-1)</p></td> -</tr> -<tr class="odd"> -<td><p>Other memberâs thread-owned sockets that connect to this member (1 in and 1 out for each). Note that this might include server threads if any of the other members are servers (see Server).</p></td> -<td><p>Summation over (M-1) other members of (T*2)</p></td> -</tr> -</tbody> -</table> - -**Note:** -The threads servicing client requests add to the total count of thread-owned sockets both for this member connecting to its peers and for peers that connect to this member. - -## <a id="socket_comm__section_0497E07414CC4E0B968B4F3A7AFD3690" class="no-quick-link"></a>Server Socket Requirements Per Server - -Servers use one connection for each incoming client connection. By default, each connection is serviced by a server thread. These threads that service client requests communicate with the rest of the server distributed system to satisfy the requests and distributed update operations. Each of these threads uses its own thread-owned sockets for peer-to-peer communication. So this adds to the serverâs group of thread-owned sockets. - -The thread and connection count in the server may be limited by server configuration settings. These are max-connections and max-threads settings in the <cache-server> element of the `cache.xml`. These settings limit the number of connections the server accepts and the maximum number of threads that can service client requests. Both of these limit the server's overall connection requirements: - -- When the connection limit is reached, the server refuses additional connections. This limits the number of connections the server uses for clients. -- When the thread limit is reached, threads start servicing multiple connections. This does not limit the number of client connections, but does limit the number of peer connections required to service client requests. Each server thread used for clients uses its own sockets, so it requires 2 connections to each of the serverâs peers. The max-threads setting puts a cap on the number of this type of peer connection that your server needs. - -The server uses one socket for each incoming client pool connection. If client subscriptions are used, the server creates an additional connection to each client that enables subscriptions. - -In this table, M is the total number of members in the distributed system. - -<table> -<colgroup> -<col width="50%" /> -<col width="50%" /> -</colgroup> -<thead> -<tr class="header"> -<th>Server Socket Description</th> -<th>Number Used</th> -</tr> -</thead> -<tbody> -<tr class="odd"> -<td>Listener for incoming client connections</td> -<td><p>1</p></td> -</tr> -<tr class="even"> -<td>Client pool connections to server</td> -<td>Number of pool connections to this server</td> -</tr> -<tr class="odd"> -<td><p>Threads servicing client requests (the lesser of the client pool connection count and the serverâs max-threads setting). These connections are to the serverâs peers.</p></td> -<td><p>(2 * number of threads in a server that service client pool connections)</p> -<p>* (M-1)</p> -<p>These threads do not share sockets.</p></td> -</tr> -<tr class="even"> -<td>Subscription connections</td> -<td><p>2 * number of client subscription connections to this server</p></td> -</tr> -</tbody> -</table> - -With client/server installations, the number of client connections to any single server is undetermined, but Geodeâs server load balancing and conditioning keeps the connections fairly evenly distributed among servers. - -Servers are peers in their own distributed system and have the additional socket requirements as noted in the Peer-to-Peer section above. - -## <a id="socket_comm__section_0D46E55422D24BA1B0CD888E14FD5182" class="no-quick-link"></a>Client Socket Requirements per Client - -Client connection requirements are compounded by how many pools they use. The use varies according to runtime client connection needs, but will usually have maximum and minimum settings. Look for the <pool> element in the `cache.xml` for the configuration properties. - -<table> -<colgroup> -<col width="50%" /> -<col width="50%" /> -</colgroup> -<thead> -<tr class="header"> -<th>Client Socket Description</th> -<th>Number Used</th> -</tr> -</thead> -<tbody> -<tr class="odd"> -<td><p>Pool connection</p></td> -<td><p>summation over the client pools of max-connections</p></td> -</tr> -<tr class="even"> -<td><p>Subscription connections</p></td> -<td><p>2 * summation over the client pools of subscription-enabled</p></td> -</tr> -</tbody> -</table> - -If your client acts as a peer in its own distributed system, it has the additional socket requirements as noted in the Peer-to-Peer section of this topic. http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/84cfbdfc/geode-docs/managing/monitor_tune/socket_communication_setting_socket_buffer_sizes.html.md.erb ---------------------------------------------------------------------- diff --git a/geode-docs/managing/monitor_tune/socket_communication_setting_socket_buffer_sizes.html.md.erb b/geode-docs/managing/monitor_tune/socket_communication_setting_socket_buffer_sizes.html.md.erb deleted file mode 100644 index 41884a2..0000000 --- a/geode-docs/managing/monitor_tune/socket_communication_setting_socket_buffer_sizes.html.md.erb +++ /dev/null @@ -1,144 +0,0 @@ ---- -title: Setting Socket Buffer Sizes ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to You under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - -When you determine buffer size settings, you try to strike a balance between communication needs and other processing. - -Larger socket buffers allow your members to distribute data and events more quickly, but they also take memory away from other things. If you store very large data objects in your cache, finding the right sizing for your buffers while leaving enough memory for the cached data can become critical to system performance. - -Ideally, you should have buffers large enough for the distribution of any single data object so you donât get message fragmentation, which lowers performance. Your buffers should be at least as large as your largest stored objects and their keys plus some overhead for message headers. The overhead varies depending on the who is sending and receiving, but 100 bytes should be sufficient. You can also look at the statistics for the communication between your processes to see how many bytes are being sent and received. - -If you see performance problems and logging messages indicating blocked writers, increasing your buffer sizes may help. - -This table lists the settings for the various member relationships and protocols, and tells where to set them. - -<table> -<colgroup> -<col width="33%" /> -<col width="33%" /> -<col width="34%" /> -</colgroup> -<thead> -<tr class="header"> -<th>Protocol / Area Affected</th> -<th>Configuration Location</th> -<th>Property Name</th> -</tr> -</thead> -<tbody> -<tr class="odd"> -<td><strong>TCP / IP</strong></td> -<td>---</td> -<td>---</td> -</tr> -<tr class="even"> -<td>Peer-to-peer send/receive</td> -<td><p>gemfire.properties</p></td> -<td>socket-buffer-size</td> -</tr> -<tr class="odd"> -<td>Client send/receive</td> -<td><p>cache.xml <pool></p></td> -<td>socket-buffer-size</td> -</tr> -<tr class="even"> -<td>Server send/receive</td> -<td><code class="ph codeph">gfsh start server</code> or -<p>cache.xml <CacheServer></p></td> -<td>socket-buffer-size</td> -</tr> -<tr class="odd"> -<td><strong>UDP Multicast</strong></td> -<td>---</td> -<td>---</td> -</tr> -<tr class="even"> -<td>Peer-to-peer send</td> -<td>gemfire.properties</td> -<td>mcast-send-buffer-size</td> -</tr> -<tr class="odd"> -<td>Peer-to-peer receive</td> -<td>gemfire.properties</td> -<td>mcast-recv-buffer-size</td> -</tr> -<tr class="even"> -<td><strong>UDP Unicast</strong></td> -<td>---</td> -<td>---</td> -</tr> -<tr class="odd"> -<td>Peer-to-peer send</td> -<td>gemfire.properties</td> -<td>udp-send-buffer-size</td> -</tr> -<tr class="even"> -<td>Peer-to-peer receive</td> -<td>gemfire.properties</td> -<td>udp-recv-buffer-size</td> -</tr> -</tbody> -</table> - -**TCP/IP Buffer Sizes** - -If possible, your TCP/IP buffer size settings should match across your Geode installation. At a minimum, follow the guidelines listed here. - -- **Peer-to-peer**. The socket-buffer-size setting in `gemfire.properties` should be the same throughout your distributed system. -- **Client/server**. The clientâs pool socket-buffer size-should match the setting for the servers the pool uses, as in these example `cache.xml` snippets: - - ``` pre - Client Socket Buffer Size cache.xml Configuration: - <pool>name="PoolA" server-group="dataSetA" socket-buffer-size="42000"... - - Server Socket Buffer Size cache.xml Configuration: - <cache-server port="40404" socket-buffer-size="42000"> - <group>dataSetA</group> - </cache-server> - ``` - -**UDP Multicast and Unicast Buffer Sizes** - -With UDP communication, one receiver can have many senders sending to it at once. To accommodate all of the transmissions, the receiving buffer should be larger than the sum of the sending buffers. If you have a system with at most five members running at any time, in which all members update their data regions, you would set the receiving buffer to at least five times the size of the sending buffer. If you have a system with producer and consumer members, where only two producer members ever run at once, the receiving buffer sizes should be set at over two times the sending buffer sizes, as shown in this example: - -``` pre -mcast-send-buffer-size=42000 -mcast-recv-buffer-size=90000 -udp-send-buffer-size=42000 -udp-recv-buffer-size=90000 -``` - -**Operating System Limits** - -Your operating system sets limits on the buffer sizes it allows. If you request a size larger than the allowed, you may get warnings or exceptions about the setting during startup. These are two examples of the type of message you may see: - -``` pre -[warning 2008/06/24 16:32:20.286 PDT CacheRunner <main> tid=0x1] -requested multicast send buffer size of 9999999 but got 262144: see -system administration guide for how to adjust your OS - -Exception in thread "main" java.lang.IllegalArgumentException: Could not -set "socket-buffer-size" to "99262144" because its value can not be -greater than "20000000". -``` - -If you think you are requesting more space for your buffer sizes than your system allows, check with your system administrator about adjusting the operating system limits. - - http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/84cfbdfc/geode-docs/managing/monitor_tune/socket_communication_tcpip_p2p_handshake_timeouts.html.md.erb ---------------------------------------------------------------------- diff --git a/geode-docs/managing/monitor_tune/socket_communication_tcpip_p2p_handshake_timeouts.html.md.erb b/geode-docs/managing/monitor_tune/socket_communication_tcpip_p2p_handshake_timeouts.html.md.erb deleted file mode 100644 index 486c337..0000000 --- a/geode-docs/managing/monitor_tune/socket_communication_tcpip_p2p_handshake_timeouts.html.md.erb +++ /dev/null @@ -1,38 +0,0 @@ ---- -title: TCP/IP Peer-to-Peer Handshake Timeouts ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to You under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - -You can alleviate connection handshake timeouts for TCP/IP connections by increasing the connection handshake timeout interval with the system property p2p.handshakeTimeoutMs. - -The default setting is 59000 milliseconds. - -This sets the handshake timeout to 75000 milliseconds for a Java application: - -``` pre --Dp2p.handshakeTimeoutMs=75000 -``` - -The properties are passed to the cache server on the `gfsh` command line: - -``` pre -gfsh>start server --name=server_name --J=-Dp2p.handshakeTimeoutMs=75000 -``` - - http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/84cfbdfc/geode-docs/managing/monitor_tune/socket_tcp_keepalive.html.md.erb ---------------------------------------------------------------------- diff --git a/geode-docs/managing/monitor_tune/socket_tcp_keepalive.html.md.erb b/geode-docs/managing/monitor_tune/socket_tcp_keepalive.html.md.erb deleted file mode 100644 index f5512bf..0000000 --- a/geode-docs/managing/monitor_tune/socket_tcp_keepalive.html.md.erb +++ /dev/null @@ -1,31 +0,0 @@ ---- -title: TCP/IP KeepAlive Configuration ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to You under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - -Geode supports TCP KeepAlive to prevent socket connections from being timed out. - -The `gemfire.enableTcpKeepAlive` system property prevents connections that appear idle from being timed out (for example, by a firewall.) When configured to true, Geode enables the SO\_KEEPALIVE option for individual sockets. This operating system-level setting allows the socket to send verification checks (ACK requests) to remote systems in order to determine whether or not to keep the socket connection alive. - -**Note:** -The time intervals for sending the first ACK KeepAlive request, the subsequent ACK requests and the number of requests to send before closing the socket is configured on the operating system level. - -By default, this system property is set to true. - - http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/84cfbdfc/geode-docs/managing/monitor_tune/sockets_and_gateways.html.md.erb ---------------------------------------------------------------------- diff --git a/geode-docs/managing/monitor_tune/sockets_and_gateways.html.md.erb b/geode-docs/managing/monitor_tune/sockets_and_gateways.html.md.erb deleted file mode 100644 index ca20bf8..0000000 --- a/geode-docs/managing/monitor_tune/sockets_and_gateways.html.md.erb +++ /dev/null @@ -1,122 +0,0 @@ ---- -title: Configuring Sockets in Multi-Site (WAN) Deployments ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to You under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - -When you determine buffer size settings, you try to strike a balance between communication needs and other processing. - -This table lists the settings for gateway relationships and protocols, and tells where to set them. - -<table> -<colgroup> -<col width="33%" /> -<col width="33%" /> -<col width="33%" /> -</colgroup> -<thead> -<tr class="header"> -<th>Protocol / Area Affected</th> -<th>Configuration Location</th> -<th>Property Name</th> -</tr> -</thead> -<tbody> -<tr class="odd"> -<td><strong>TCP / IP</strong></td> -<td>---</td> -<td>---</td> -</tr> -<tr class="even"> -<td>Gateway sender</td> -<td><code class="ph codeph">gfsh create gateway-sender</code> or -<p>cache.xml <gateway-sender></p></td> -<td>socket-buffer-size</td> -</tr> -<tr class="odd"> -<td>Gateway receiver</td> -<td><code class="ph codeph">gfsh create gateway-receiver</code> or cache.xml <gateway-receiver></td> -<td>socket-buffer-size</td> -</tr> -</tbody> -</table> - -**TCP/IP Buffer Sizes** - -If possible, your TCP/IP buffer size settings should match across your GemFire installation. At a minimum, follow the guidelines listed here. - -- **Multisite (WAN)**. In a multi-site installation using gateways, if the link between sites is not tuned for optimum throughput, it could cause messages to back up in the cache queues. If a receiving queue overflows because of inadequate buffer sizes, it will become out of sync with the sender and the receiver will be unaware of the condition. - - The gateway sender's socket-buffer-size attribute should match the gateway receiverâs socket-buffer-size attribute for all gateway receivers that the sender connects to, as in these example `cache.xml` snippets: - - ``` pre - Gateway Sender Socket Buffer Size cache.xml Configuration: - - <gateway-sender id="sender2" parallel="true" - remote-distributed-system-id="2" - socket-buffer-size="42000" - maximum-queue-memory="150"/> - - Gateway Receiver Socket Buffer Size cache.xml Configuration: - <gateway-receiver start-port="1530" end-port="1551" - socket-buffer-size="42000"/> - ``` - -**Note:** -WAN deployments increase the messaging demands on a Geode system. To avoid hangs related to WAN messaging, always set `conserve-sockets=false` for GemFire members that participate in a WAN deployment. - -## <a id="socket_comm__section_4A7C60D4471A4339884AA5AAC97B4DAA" class="no-quick-link"></a>Multi-site (WAN) Socket Requirements - -Each gateway sender and gateway receiver uses a socket to distribute events or to listen for incoming connections from remote sites. - -<table> -<colgroup> -<col width="50%" /> -<col width="50%" /> -</colgroup> -<thead> -<tr class="header"> -<th>Multi-site Socket Description</th> -<th>Number Used</th> -</tr> -</thead> -<tbody> -<tr class="odd"> -<td><p>Listener for incoming connections</p></td> -<td><p>summation of the number of gateway-receivers defined for the member</p></td> -</tr> -<tr class="even"> -<td><p>Incoming connection</p></td> -<td><p>summation of the total number of remote gateway senders configured to connect to the gateway receiver</p></td> -</tr> -<tr class="odd"> -<td><p>Outgoing connection</p></td> -<td><p>summation of the number of gateway senders defined for the member</p></td> -</tr> -</tbody> -</table> - -Servers are peers in their own distributed system and have the additional socket requirements as noted in the Peer-to-Peer section above. - -## <a id="socket_comm__section_66D11C8E84F941B58800EDB52194B087" class="no-quick-link"></a>Member produces SocketTimeoutException - -A client, server, gateway sender, or gateway receiver produces a SocketTimeoutException when it stops waiting for a response from the other side of the connection and closes the socket. This exception typically happens on the handshake or when establishing a callback connection. - -Response: - -Increase the default socket timeout setting for the member. This timeout is set separately for the client Pool and for the gateway sender and gateway receiver, either in the `cache.xml` file or through the API. For a client/server configuration, adjust the "read-timeout" value as described in [<pool>](../../reference/topics/client-cache.html#cc-pool) or use the `org.apache.geode.cache.client.PoolFactory.setReadTimeout` method. For a gateway sender or gateway receiver, see [WAN Configuration](../../reference/topics/elements_ref.html#topic_7B1CABCAD056499AA57AF3CFDBF8ABE3). http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/84cfbdfc/geode-docs/managing/monitor_tune/system_member_performance.html.md.erb ---------------------------------------------------------------------- diff --git a/geode-docs/managing/monitor_tune/system_member_performance.html.md.erb b/geode-docs/managing/monitor_tune/system_member_performance.html.md.erb deleted file mode 100644 index 49b9f62..0000000 --- a/geode-docs/managing/monitor_tune/system_member_performance.html.md.erb +++ /dev/null @@ -1,42 +0,0 @@ ---- -title: System Member Performance ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to You under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - -You can modify some configuration parameters to improve system member performance. - -Before doing so, you should understand [Basic Configuration and Programming](../../basic_config/book_intro.html). - -- **[Distributed System Member Properties](../../managing/monitor_tune/system_member_performance_distributed_system_member.html)** - - Several performance-related properties apply to a cache server or application that connects to the distributed system. - -- **[JVM Memory Settings and System Performance](../../managing/monitor_tune/system_member_performance_jvm_mem_settings.html)** - - You configure JVM memory settings for the Java application by adding parameters to the java invocation. For the cache server, you add them to the command-line parameters for the gfsh `start server` command. - -- **[Garbage Collection and System Performance](../../managing/monitor_tune/system_member_performance_garbage.html)** - - If your application exhibits unacceptably high latencies, you might improve performance by modifying your JVMâs garbage collection behavior. - -- **[Connection Thread Settings and Performance](../../managing/monitor_tune/system_member_performance_connection_thread_settings.html)** - - When many peer processes are started concurrently, you can improve the distributed system connect time can by setting the p2p.HANDSHAKE\_POOL\_SIZE system property value to the expected number of members. - - http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/84cfbdfc/geode-docs/managing/monitor_tune/system_member_performance_connection_thread_settings.html.md.erb ---------------------------------------------------------------------- diff --git a/geode-docs/managing/monitor_tune/system_member_performance_connection_thread_settings.html.md.erb b/geode-docs/managing/monitor_tune/system_member_performance_connection_thread_settings.html.md.erb deleted file mode 100644 index 42aedb5..0000000 --- a/geode-docs/managing/monitor_tune/system_member_performance_connection_thread_settings.html.md.erb +++ /dev/null @@ -1,32 +0,0 @@ ---- -title: Connection Thread Settings and Performance ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to You under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - -When many peer processes are started concurrently, you can improve the distributed system connect time can by setting the p2p.HANDSHAKE\_POOL\_SIZE system property value to the expected number of members. - -This property controls the number of threads that can be used to establish new TCP/IP connections between peer caches. The threads are discarded if they are idle for 60 seconds. - -The default value for p2p.HANDSHAKE\_POOL\_SIZE is 10. This command-line specification sets the number of threads to 100: - -``` pre --Dp2p.HANDSHAKE_POOL_SIZE=100 -``` - - http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/84cfbdfc/geode-docs/managing/monitor_tune/system_member_performance_distributed_system_member.html.md.erb ---------------------------------------------------------------------- diff --git a/geode-docs/managing/monitor_tune/system_member_performance_distributed_system_member.html.md.erb b/geode-docs/managing/monitor_tune/system_member_performance_distributed_system_member.html.md.erb deleted file mode 100644 index 2076c7a..0000000 --- a/geode-docs/managing/monitor_tune/system_member_performance_distributed_system_member.html.md.erb +++ /dev/null @@ -1,28 +0,0 @@ ---- -title: Distributed System Member Properties ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to You under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - -Several performance-related properties apply to a cache server or application that connects to the distributed system. - -- **statistic-sampling-enabled**.Turning off statistics sampling saves resources, but it also takes away potentially valuable information for ongoing system tuning and unexpected system problems. If LRU eviction is configured, then statistics sampling must be on. -- **statistic-sample-rate**. Increasing the sample rate for statistics reduces system resource use while still providing some statistics for system tuning and failure analysis. -- **log-level**. As with the statistic sample rate, lowering this setting reduces system resource consumption. See [Logging](../logging/logging.html#concept_30DB86B12B454E168B80BB5A71268865). - - http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/84cfbdfc/geode-docs/managing/monitor_tune/system_member_performance_garbage.html.md.erb ---------------------------------------------------------------------- diff --git a/geode-docs/managing/monitor_tune/system_member_performance_garbage.html.md.erb b/geode-docs/managing/monitor_tune/system_member_performance_garbage.html.md.erb deleted file mode 100644 index ef23dff..0000000 --- a/geode-docs/managing/monitor_tune/system_member_performance_garbage.html.md.erb +++ /dev/null @@ -1,53 +0,0 @@ ---- -title: Garbage Collection and System Performance ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to You under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - -If your application exhibits unacceptably high latencies, you might improve performance by modifying your JVMâs garbage collection behavior. - -Garbage collection, while necessary, introduces latency into your system by consuming resources that would otherwise be available to your application. You can reduce the impact of garbage collection in two ways: - -- Optimize garbage collection in the JVM heap. -- Reduce the amount of data exposed to garbage collection by storing values in off-heap memory. - -**Note:** -Garbage collection tuning options depend on the JVM you are using. Suggestions given here apply to the Sun HotSpot JVM. If you use a different JVM, check with your vendor to see if these or comparable options are available to you. - -**Note:** -Modifications to garbage collection sometimes produce unexpected results. Always test your system before and after making changes to verify that the systemâs performance has improved. - -**Optimizing Garbage Collection** - -The two options suggested here are likely to expedite garbage collecting activities by introducing parallelism and by focusing on the data that is most likely to be ready for cleanup. The first parameter causes the garbage collector to run concurrent to your application processes. The second parameter causes it to run multiple, parallel threads for the "young generation" garbage collection (that is, garbage collection performed on the most recent objects in memoryâwhere the greatest benefits are expected): - -``` pre --XX:+UseConcMarkSweepGC -XX:+UseParNewGC -``` - -For applications, if you are using remote method invocation (RMI) Java APIs, you might also be able to reduce latency by disabling explicit calls to the garbage collector. The RMI internals automatically invoke garbage collection every sixty seconds to ensure that objects introduced by RMI activities are cleaned up. Your JVM may be able to handle these additional garbage collection needs. If so, your application may run faster with explicit garbage collection disabled. You can try adding the following command-line parameter to your application invocation and test to see if your garbage collector is able to keep up with demand: - -``` pre --XX:+DisableExplicitGC -``` - -**Using Off-heap Memory** - -You can improve the performance of some applications by storing data values in off-heap memory. Certain objects, such as keys, must remain in the JVM heap. See [Managing Off-Heap Memory](../heap_use/off_heap_management.html) for more information. - - http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/84cfbdfc/geode-docs/managing/monitor_tune/system_member_performance_jvm_mem_settings.html.md.erb ---------------------------------------------------------------------- diff --git a/geode-docs/managing/monitor_tune/system_member_performance_jvm_mem_settings.html.md.erb b/geode-docs/managing/monitor_tune/system_member_performance_jvm_mem_settings.html.md.erb deleted file mode 100644 index 4440b25..0000000 --- a/geode-docs/managing/monitor_tune/system_member_performance_jvm_mem_settings.html.md.erb +++ /dev/null @@ -1,78 +0,0 @@ ---- -title: JVM Memory Settings and System Performance ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to You under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - -You configure JVM memory settings for the Java application by adding parameters to the java invocation. For the cache server, you add them to the command-line parameters for the gfsh `start server` command. - -- JVM heap sizeâYour JVM may require more memory than is allocated by default. For example, you may need to increase heap size for an application that stores a lot of data. You can set a maximum size and an initial size, so if you know you will be using the maximum (or close to it) for the life of the member, you can speed memory allocation time by setting the initial size to the maximum. This sets both the maximum and initial memory sizes to 1024 megabytes for a Java application: - - ``` pre - -Xmx1024m -Xms1024m - ``` - - Properties can be passed to the cache server on the `gfsh` command line: - - ``` pre - gfsh>start server --name=server-name --J=-Xmx1024m --J=-Xms1024m - ``` - -- MaxDirectMemorySizeâThe JVM has a kind of memory called direct memory, which is distinct from normal JVM heap memory, that can run out. You can increase the direct buffer memory either by increasing the maximum heap size (see previous JVM Heap Size), which increases both the maximum heap and the maximum direct memory, or by only increasing the maximum direct memory using -XX:MaxDirectMemorySize. The following parameter added to the Java application startup increases the maximum direct memory size to 256 megabytes: - - ``` pre - -XX:MaxDirectMemorySize=256M - ``` - - The same effect for the cache server: - - ``` pre - gfsh>start server --name=server-name --J=-XX:MaxDirectMemorySize=256M - ``` - -- JVM stack sizeâEach thread in a Java application has its own stack. The stack is used to hold return addresses, arguments to functions and method calls, and so on. Since Geode is a highly multi-threaded system, at any given point in time there are multiple thread pools and threads that are in use. The default stack size setting for a thread in Java is 1MB. Stack size has to be allocated in contiguous blocks and if the machine is being used actively and there are many threads running in the system (Task Manager shows the number of active threads), you may encounter an `OutOfMemory error: unable to create new native thread`, even though your process has enough available heap. If this happens, consider reducing the stack size requirement for threads on the cache server. The following parameter added to the Java application startup limits the maximum size of the stack. - - ``` pre - -Xss384k - ``` - - In particular, we recommend starting the cache servers with a stack size of 384k or 512k in such cases. For example: - - ``` pre - gfsh>start server --name=server-name --J=-Xss384k - - gfsh>start server --name=server-name --J=-Xss512k - ``` - -- Off-heap memory sizeâFor applications that use off-heap memory, specifies how much off-heap memory to allocate. Setting `off-heap-memory-size` is prerequisite to enabling the off-heap capability for individual regions. For example: - - ``` pre - gfsh>start server --name=server-name --off-heap-memory-size=200G - ``` - - See [Using Off-heap Memory](../heap_use/off_heap_management.html#managing-off-heap-memory) for additional considerations regarding this parameter. - -- Lock memoryâOn Linux systems, you can prevent heap and off-heap memory from being paged out by setting the `lock-memory` parameter to `true`. For example: - - ``` pre - gfsh>start server --name=server-name --off-heap-memory-size=200G --lock-memory=true - ``` - - See [Locking Memory](../heap_use/lock_memory.html) for additional considerations regarding this parameter. - - http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/84cfbdfc/geode-docs/managing/monitor_tune/udp_communication.html.md.erb ---------------------------------------------------------------------- diff --git a/geode-docs/managing/monitor_tune/udp_communication.html.md.erb b/geode-docs/managing/monitor_tune/udp_communication.html.md.erb deleted file mode 100644 index 4a5d3c0..0000000 --- a/geode-docs/managing/monitor_tune/udp_communication.html.md.erb +++ /dev/null @@ -1,50 +0,0 @@ ---- -title: UDP Communication ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to You under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - -You can make configuration adjustments to improve multicast and unicast UDP performance of peer-to-peer communication. - -You can tune your Geode UDP messaging to maximize throughput. There are two main tuning goals: to use the largest reasonable datagram packet sizes and to reduce retransmission rates. These actions reduce messaging overhead and overall traffic on your network while still getting your data where it needs to go. Geode also provides statistics to help you decide when to change your UDP messaging settings. - -Before you begin, you should understand Geode [Basic Configuration and Programming](../../basic_config/book_intro.html). See also the general communication tuning and multicast-specific tuning covered in [Socket Communication](socket_communication.html) and [Multicast Communication](multicast_communication.html#multicast). - -## <a id="udp_comm__section_4089ACC33AF34FA888BAE3CA3602A730" class="no-quick-link"></a>UDP Datagram Size - -You can change the UDP datagram size with the Geode property udp-fragment-size. This is the maximum packet size for transmission over UDP unicast or multicast sockets. When possible, smaller messages are combined into batches up to the size of this setting. - -Most operating systems set a maximum transmission size of 64k for UDP datagrams, so this setting should be kept under 60k to allow for communication headers. Setting the fragment size too high can result in extra network traffic if your network is subject to packet loss, as more data must be resent for each retransmission. If many UDP retransmissions appear in DistributionStats, you maybe achieve better throughput by lowering the fragment size. - -## <a id="udp_comm__section_B9882A4EBA004599B2207B9CB1D3ADC9" class="no-quick-link"></a>UDP Flow Control - -UDP protocols typically have a flow control protocol built into them to keep processes from being overrun by incoming no-ack messages. The Geode UDP flow control protocol is a credit based system in which the sender has a maximum number of bytes it can send before getting its byte credit count replenished, or recharged, by its receivers. While its byte credits are too low, the sender waits. The receivers do their best to anticipate the senderâs recharge requirements and provide recharges before they are needed. If the senders credits run too low, it explicitly requests a recharge from its receivers. - -This flow control protocol, which is used for all multicast and unicast no-ack messaging, is configured using a three-part Geode property mcast-flow-control. This property is composed of: - -- byteAllowanceâDetermines how many bytes (also referred to as credits) can be sent before receiving a recharge from the receiving processes. -- rechargeThresholdâSets a lower limit on the ratio of the senderâs remaining credit to its byteAllowance. When the ratio goes below this limit, the receiver automatically sends a recharge. This reduces recharge request messaging from the sender and helps keep the sender from blocking while waiting for recharges. -- rechargeBlockMsâTells the sender how long to wait while needing a recharge before explicitly requesting one. - -In a well-tuned system, where consumers of cache events are keeping up with producers, the byteAllowance can be set high to limit flow-of-control messaging and pauses. JVM bloat or frequent message retransmissions are an indication that cache events from producers are overrunning consumers. - -## <a id="udp_comm__section_FB1F54A41D2643A29DB416D309ED4C56" class="no-quick-link"></a>UDP Retransmission Statistics - -Geode stores retransmission statistics for its senders and receivers. You can use these statistics to help determine whether your flow control and fragment size settings are appropriate for your system. - -The retransmission rates are stored in the DistributionStats ucastRetransmits and mcastRetransmits. For multicast, there is also a receiver-side statistic mcastRetransmitRequests that can be used to see which processes aren't keeping up and are requesting retransmissions. There is no comparable way to tell which receivers are having trouble receiving unicast UDP messages. http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/84cfbdfc/geode-docs/managing/network_partitioning/chapter_overview.html.md.erb ---------------------------------------------------------------------- diff --git a/geode-docs/managing/network_partitioning/chapter_overview.html.md.erb b/geode-docs/managing/network_partitioning/chapter_overview.html.md.erb deleted file mode 100644 index 98d3c0b..0000000 --- a/geode-docs/managing/network_partitioning/chapter_overview.html.md.erb +++ /dev/null @@ -1,48 +0,0 @@ ---- -title: Network Partitioning ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to You under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - -Apache Geode architecture and management features help detect and resolve network partition problems. - -- **[How Network Partitioning Management Works](../../managing/network_partitioning/how_network_partitioning_management_works.html)** - - Geode handles network outages by using a weighting system to determine whether the remaining available members have a sufficient quorum to continue as a distributed system. - -- **[Failure Detection and Membership Views](../../managing/network_partitioning/failure_detection.html)** - - Geode uses failure detection to remove unresponsive members from membership views. - -- **[Membership Coordinators, Lead Members and Member Weighting](../../managing/network_partitioning/membership_coordinators_lead_members_and_weighting.html)** - - Network partition detection uses a designated membership coordinator and a weighting system that accounts for a lead member to determine whether a network partition has occurred. - -- **[Network Partitioning Scenarios](../../managing/network_partitioning/network_partitioning_scenarios.html)** - - This topic describes network partitioning scenarios and what happens to the partitioned sides of the distributed system. - -- **[Configure Apache Geode to Handle Network Partitioning](../../managing/network_partitioning/handling_network_partitioning.html)** - - This section lists the configuration steps for network partition detection. - -- **[Preventing Network Partitions](../../managing/network_partitioning/preventing_network_partitions.html)** - - This section provides a short list of things you can do to prevent network partition from occurring. - - http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/84cfbdfc/geode-docs/managing/network_partitioning/failure_detection.html.md.erb ---------------------------------------------------------------------- diff --git a/geode-docs/managing/network_partitioning/failure_detection.html.md.erb b/geode-docs/managing/network_partitioning/failure_detection.html.md.erb deleted file mode 100644 index 223b3d9..0000000 --- a/geode-docs/managing/network_partitioning/failure_detection.html.md.erb +++ /dev/null @@ -1,62 +0,0 @@ ---- -title: Failure Detection and Membership Views ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to You under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - -Geode uses failure detection to remove unresponsive members from membership views. - -## <a id="concept_CFD13177F78C456095622151D6EE10EB__section_1AAE6C92FED249EFBA476D8A480B8E51" class="no-quick-link"></a>Failure Detection - -Network partitioning has a failure detection protocol that is not subject to hanging when NICs or machines fail. Failure detection has each member observe messages from the peer to its right within the membership view (see "Membership Views" below for the view layout). A member that suspects the failure of its peer to the right sends a datagram heartbeat request to the suspect member. With no response from the suspect member, the suspicious member broadcasts a `SuspectMembersMessage` datagram message to all other members. The coordinator attempts to connect to the suspect member. If the connection attempt is unsuccessful, the suspect member is removed from the membership view. The suspect member is sent a message to disconnect from the distributed system and close the cache. In parallel to the receipt of the `SuspectMembersMessage`, a distributed algorithm promotes the leftmost member within the view to act as the coordinator, if the coordinator is the suspect member. - -Failure detection processing is also initiated on a member if the `gemfire.properties` `ack-wait-threshold` elapses before receiving a response to a message, if a TCP/IP connection cannot be made to the member for peer-to-peer (P2P) messaging, and if no other traffic is detected from the member. - -**Note:** -The TCP connection ping is not used for connection keep alive purposes; it is only used to detect failed members. See [TCP/IP KeepAlive Configuration](../monitor_tune/socket_tcp_keepalive.html#topic_jvc_pw3_34) for TCP keep alive configuration. - -If a new membership view is sent out that includes one or more failed members, the coordinator will log new quorum weight calculations. At any point, if quorum loss is detected due to unresponsive processes, the coordinator will also log a severe level message to identify the failed members: -``` pre -Possible loss of quorum detected due to loss of {0} cache processes: {1} -``` - -in which {0} is the number of processes that failed and {1} lists the members (cache processes). - -## <a id="concept_CFD13177F78C456095622151D6EE10EB__section_1170FBBD6B7A483AB2C2A837F1B8876D" class="no-quick-link"></a>Membership Views - -The following is a sample membership view: - -``` pre -[info 2012/01/06 11:44:08.164 PST bridgegemfire1 <UDP Incoming Message Handler> tid=0x1f] -Membership: received new view [ent(5767)<v0>:8700|16] [ent(5767)<v0>:8700/44876, -ent(5829)<v1>:48034/55334, ent(5875)<v2>:4738/54595, ent(5822)<v5>:49380/39564, -ent(8788)<v7>:24136/53525] -``` - -The components of the membership view are as follows: - -- The first part of the view (`[ent(5767)<v0>:8700|16]` in the example above) corresponds to the view ID. It identifies: - - the address and processId of the membership coordinator-- `ent(5767)` in example above. - - the view-number (`<vXX>`) of the membership view that the member first appeared in-- `<v0>` in example above. - - membership-port of the membership coordinator-- `8700` in the example above. - - view-number-- `16` in the example above -- The second part of the view lists all of the member processes in the current view. `[ent(5767)<v0>:8700/44876, ent(5829)<v1>:48034/55334, ent(5875)<v2>:4738/54595, ent(5822)<v5>:49380/39564, ent(8788)<v7>:24136/53525]` in the example above. -- The overall format of each listed member is:`Address(processId)<vXX>:membership-port/distribution port`. The membership coordinator is almost always the first member in the view and the rest are ordered by age. -- The membership-port is the JGroups TCP UDP port that it uses to send datagrams. The distribution-port is the TCP/IP port that is used for cache messaging. -- Each member watches the member to its right for failure detection purposes. - http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/84cfbdfc/geode-docs/managing/network_partitioning/handling_network_partitioning.html.md.erb ---------------------------------------------------------------------- diff --git a/geode-docs/managing/network_partitioning/handling_network_partitioning.html.md.erb b/geode-docs/managing/network_partitioning/handling_network_partitioning.html.md.erb deleted file mode 100644 index 61a2576..0000000 --- a/geode-docs/managing/network_partitioning/handling_network_partitioning.html.md.erb +++ /dev/null @@ -1,63 +0,0 @@ ---- -title: Configure Apache Geode to Handle Network Partitioning ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to You under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - -This section lists the configuration steps for network partition detection. - -<a id="handling_network_partitioning__section_EAF1957B6446491A938DEFB06481740F"></a> -The system uses a combination of member coordinators and system members, designated as lead members, to detect and resolve network partitioning problems. - -1. Network partition detection works in all environments. Using multiple locators mitigates the effect of network partitioning. See [Configuring Peer-to-Peer Discovery](../../topologies_and_comm/p2p_configuration/setting_up_a_p2p_system.html). -2. Enable partition detection consistently in all system members by setting this in their `gemfire.properties` file: - - ``` pre - enable-network-partition-detection=true - ``` - - Enable network partition detection in all locators and in any other process that should be sensitive to network partitioning. Processes that do not have network partition detection enabled are not eligible to be the lead member, so their failure will not trigger declaration of a network partition. - - All system members should have the same setting for `enable-network-partition-detection`. If they donât, the system throws a `GemFireConfigException` upon startup. - -3. You must set `enable-network-partition-detection` to true if you are using persistent partitioned regions. You **must** set `enable-network-partition-detection` to true if you are using persistent regions (partitioned or replicated). If you create a persistent region and `enable-network-partition-detection` to set to false, you will receive the following warning message: - - ``` pre - Creating persistent region {0}, but enable-network-partition-detection is set to false. - Running with network partition detection disabled can lead to an unrecoverable system in the - event of a network split." - ``` - -4. Configure regions you want to protect from network partitioning with `DISTRIBUTED_ACK` or `GLOBAL` `scope`. Do not use `DISTRIBUTED_NO_ACK` `scope`. The region configurations provided in the region shortcut settings use `DISTRIBUTED_ACK` scope. This setting prevents operations from performed throughout the distributed system before a network partition is detected. - **Note:** - GemFire issues an alert if it detects distributed-no-ack regions when network partition detection is enabled: - - ``` pre - Region {0} is being created with scope {1} but enable-network-partition-detection is enabled in the distributed system. - This can lead to cache inconsistencies if there is a network failure. - - ``` - -5. These other configuration parameters affect or interact with network partitioning detection. Check whether they are appropriate for your installation and modify as needed. - - If you have network partition detection enabled, the threshold percentage value for allowed membership weight loss is automatically configured to 51. You cannot modify this value. (**Note:** The weight loss calculation uses standard rounding. Therefore, a value of 50.51 is rounded to 51 and will cause a network partition.) - - Failure detection is initiated if a member's `gemfire.properties` `ack-wait-threshold` (default is 15 seconds) and `ack-severe-alert-threshold` (15 seconds) elapses before receiving a response to a message. If you modify the `ack-wait-threshold` configuration value, you should modify `ack-severe-alert-threshold` to match the other configuration value. - - If the system has clients connecting to it, the clients' `cache.xml` `<cache> <pool> read-timeout` should be set to at least three times the `member-timeout` setting in the server's `gemfire.properties`. The default `<cache> <pool> read-timeout` setting is 10000 milliseconds. - - You can adjust the default weights of members by specifying the system property `gemfire.member-weight` upon startup. For example, if you have some VMs that host a needed service, you could assign them a higher weight upon startup. - - By default, members that are forced out of the distributed system by a network partition event will automatically restart and attempt to reconnect. Data members will attempt to reinitialize the cache. See [Handling Forced Cache Disconnection Using Autoreconnect](../autoreconnect/member-reconnect.html). - - http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/84cfbdfc/geode-docs/managing/network_partitioning/how_network_partitioning_management_works.html.md.erb ---------------------------------------------------------------------- diff --git a/geode-docs/managing/network_partitioning/how_network_partitioning_management_works.html.md.erb b/geode-docs/managing/network_partitioning/how_network_partitioning_management_works.html.md.erb deleted file mode 100644 index e971634..0000000 --- a/geode-docs/managing/network_partitioning/how_network_partitioning_management_works.html.md.erb +++ /dev/null @@ -1,59 +0,0 @@ ---- -title: How Network Partitioning Management Works ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to You under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - -Geode handles network outages by using a weighting system to determine whether the remaining available members have a sufficient quorum to continue as a distributed system. - -<a id="how_network_partitioning_management_works__section_548146BB8C24412CB7B43E6640272882"></a> -Individual members are each assigned a weight, and the quorum is determined by comparing the total weight of currently responsive members to the previous total weight of responsive members. - -Your distributed system can split into separate running systems when members lose the ability to see each other. The typical cause of this problem is a failure in the network. When a partitioned system is detected, Apache Geode only one side of the system keeps running and the other side automatically shuts down. - -**Note:** -The network partitioning detection feature is only enabled when `enable-network-partition-detection` is set to true in `gemfire.properties`. By default, this property is set to false. See [Configure Apache Geode to Handle Network Partitioning](handling_network_partitioning.html#handling_network_partitioning) for details. Quorum weight calculations are always performed and logged regardless of this configuration setting. - -The overall process for detecting a network partition is as follows: - -1. The distributed system starts up. When you start up a distributed system, start the locators first, start the cache servers second, and then start other members such as applications or processes that access distributed system data. -2. After the members start up, the oldest member, typically a locator, assumes the role of the membership coordinator. Peer discovery occurs as members come up and members generate a membership discovery list for the distributed system. Locators hand out the membership discovery list as each member process starts up. This list typically contains a hint on who the current membership coordinator is. -3. Members join and if necessary, depart the distributed system: - - Member processes make a request to the coordinator to join the distributed system. If authenticated, the coordinator creates a new membership view, hands the new membership view to the new member, and begins the process of sending the new membership view (to add the new member or members) by sending out a view preparation message to existing members in the view. - - While members are joining the system, it is possible that members are also leaving or being removed through the normal failure detection process. Failure detection removes unresponsive or slow members. See [Managing Slow Receivers](../monitor_tune/slow_receivers_managing.html) and [Failure Detection and Membership Views](failure_detection.html#concept_CFD13177F78C456095622151D6EE10EB) for descriptions of the failure detection process. If a new membership view is sent out that includes one or more failed processes, the coordinator will log the new weight calculations. At any point, if quorum loss is detected due to unresponsive processes, the coordinator will also log a severe level message to identify the failed processes: - - ``` pre - Possible loss of quorum detected due to loss of {0} cache processes: {1} - ``` - - where {0} is the number of processes that failed and {1} lists the processes. - -4. Whenever the coordinator is alerted of a membership change (a member either joins or leaves the distributed system), the coordinator generates a new membership view. The membership view is generated by a two-phase protocol: - 1. In the first phase, the membership coordinator sends out a view preparation message to all members and waits 12 seconds for a view preparation ack return message from each member. If the coordinator does not receive an ack message from a member within 12 seconds, the coordinator attempts to connect to the member's failure-detection socket. If the coordinator cannot connect to the member's failure-detection socket, the coordinator declares the member dead and starts the membership view protocol again from the beginning. - 2. In the second phase, the coordinator sends out the new membership view to all members that acknowledged the view preparation message or passed the connection test. - -5. Each time the membership coordinator sends a view, each member calculates the total weight of members in the current membership view and compares it to the total weight of the previous membership view. Some conditions to note: - - When the first membership view is sent out, there are no accumulated losses. The first view only has additions. - - A new coordinator may have a stale view of membership if it did not see the last membership view sent by the previous (failed) coordinator. If new members were added during that failure, then the new members may be ignored when the first new view is sent out. - - If members were removed during the fail over to the new coordinator, then the new coordinator will have to determine these losses during the view preparation step. - -6. With `enable-network-partition-detection` set to true, any member that detects that the total membership weight has dropped below 51% within a single membership view change (loss of quorum) declares a network partition event. The coordinator sends a network-partitioned-detected UDP message to all members (even to the non-responsive ones) and then closes the distributed system with a `ForcedDisconnectException`. If a member fails to receive the message before the coordinator closes the system, the member is responsible for detecting the event on its own. - -The presumption is that when a network partition is declared, the members that comprise a quorum will continue operations. The surviving members elect a new coordinator, designate a lead member, and so on. - - http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/84cfbdfc/geode-docs/managing/network_partitioning/membership_coordinators_lead_members_and_weighting.html.md.erb ---------------------------------------------------------------------- diff --git a/geode-docs/managing/network_partitioning/membership_coordinators_lead_members_and_weighting.html.md.erb b/geode-docs/managing/network_partitioning/membership_coordinators_lead_members_and_weighting.html.md.erb deleted file mode 100644 index cb21f54..0000000 --- a/geode-docs/managing/network_partitioning/membership_coordinators_lead_members_and_weighting.html.md.erb +++ /dev/null @@ -1,79 +0,0 @@ ---- -title: Membership Coordinators, Lead Members and Member Weighting ---- - -<!-- -Licensed to the Apache Software Foundation (ASF) under one or more -contributor license agreements. See the NOTICE file distributed with -this work for additional information regarding copyright ownership. -The ASF licenses this file to You under the Apache License, Version 2.0 -(the "License"); you may not use this file except in compliance with -the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. ---> - -Network partition detection uses a designated membership coordinator and a weighting system that accounts for a lead member to determine whether a network partition has occurred. - -## <a id="concept_23C2606D59754106AFBFE17515DF4330__section_7C67F1D30C1645CC8489E481873691D9" class="no-quick-link"></a>Membership Coordinators and Lead Members - -The membership coordinator is a member that manages entry and exit of other members of the distributed system. With network partition detection enabled, the coordinator can be any Geode member but locators are preferred. In a locator-based system, if all locators are in the reconnecting state, the system continues to function, but new members are not able to join until a locator has successfully reconnected. After a locator has reconnected, the reconnected locator will take over the role of coordinator. - -When a coordinator is shutting down, it sends out a view that removes itself from the list and the other members must determine who the new coordinator is. - -The lead member is determined by the coordinator. Any member that has enabled network partition detection, is not hosting a locator, and is not an administrator interface-only member is eligible to be designated as the lead member by the coordinator. The coordinator chooses the longest-lived member that fits the criteria. - -The purpose of the lead member role is to provide extra weight. It does not perform any specific functionality. - -## <a id="concept_23C2606D59754106AFBFE17515DF4330__section_D819DE21928F4D658C132981307447E3" class="no-quick-link"></a>Member Weighting System - -By default, individual members are assigned the following weights: - -- Each member has a weight of 10 except the lead member. -- The lead member is assigned a weight of 15. -- Locators have a weight of 3. - -You can modify the default weights for specific members by defining the `gemfire.member-weight` system property upon startup. - -The weights of members prior to the view change are added together and compared to the weight of lost members. Lost members are considered members that were removed between the last view and the completed send of the view preparation message. If membership is reduced by a certain percentage within a single membership view change, a network partition is declared. - -The loss percentage threshold is 51 (meaning 51%). Note that the percentage calculation uses standard rounding. Therefore, a value of 50.51 is rounded to 51. If the rounded loss percentage is equal to or greater than 51%, the membership coordinator initiates shut down. - -## <a id="concept_23C2606D59754106AFBFE17515DF4330__section_53C963D1B2DF417C973A60981E52CDCF" class="no-quick-link"></a>Sample Member Weight Calculations - -This section provides some example calculations. - -**Example 1:** Distributed system with 12 members. 2 locators, 10 cache servers (one cache server is designated as lead member.) View total weight equals 111. - -- 4 cache servers become unreachable. Total membership weight loss is 40 (36%). Since 36% is under the 51% threshold for loss, the distributed system stays up. -- 1 locator and 4 cache servers (including the lead member) become unreachable. Membership weight loss equals 48 (43%). Since 43% is under the 51% threshold for loss, the distributed system stays up. -- 5 cache servers (not including the lead member) and both locators become unreachable. Membership weight loss equals 56 (49%). Since 49% is under the 51% threshold for loss, the distributed system stays up. -- 5 cache servers (including the lead member) and 1 locator become unreachable. Membership weight loss equals 58 (52%). Since 52% is greater than the 51% threshold, the coordinator initiates shutdown. -- 6 cache servers (not including the lead member) and both locators become unreachable. Membership weight loss equals 66 (59%). Since 59% is greater than the 51% threshold, the newly elected coordinator (a cache server since no locators remain) will initiate shutdown. - -**Example 2:** Distributed system with 4 members. 2 cache servers (1 cache server is designated lead member), 2 locators. View total weight is 31. - -- Cache server designated as lead member becomes unreachable. Membership weight loss equals 15 or 48%. Distributed system stays up. -- Cache server designated as lead member and 1 locator become unreachable. Member weight loss equals 18 or 58%. Membership coordinator initiates shutdown. If the locator that became unreachable was the membership coordinator, the other locator is elected coordinator and then initiates shutdown. - -Even if network partitioning is not enabled, if quorum loss is detected due to unresponsive processes, the locator will also log a severe level message to identify the failed processes: -``` pre -Possible loss of quorum detected due to loss of {0} cache processes: {1} -``` - -where {0} is the number of processes that failed and {1} lists the processes. - -Enabling network partition detection allows only one subgroup to survive a split. The rest of the system is disconnected and the caches are closed. - -When a shutdown occurs, the members that are shut down will log the following alert message: -``` pre -Exiting due to possible network partition event due to loss of {0} cache processes: {1} -``` - -where `{0}` is the count of lost members and `{1}` is the list of lost member IDs.