This is an automated email from the ASF dual-hosted git repository.

jolynch pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/trunk by this push:
     new 8402d1f  Add documentation of hints
8402d1f is described below

commit 8402d1f1456dc4da279f53dbd02f5ce7a1b2dffc
Author: dvohra <dvohr...@yahoo.com>
AuthorDate: Mon Jan 6 19:37:12 2020 -0800

    Add documentation of hints
    
    Patch by Deepak Vohra; Reviewed by Joseph Lynch for CASSANDRA-15491
---
 CHANGES.txt                           |   1 +
 doc/source/architecture/dynamo.rst    |   2 +
 doc/source/operating/hints.rst        | 259 +++++++++++++++++++++++++++++++++-
 doc/source/operating/images/hints.svg |   9 ++
 doc/source/operating/metrics.rst      |   4 +
 5 files changed, 273 insertions(+), 2 deletions(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index 5fe958c..f1da0b7 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 4.0-alpha4
+ * Add documentation of hints (CASSANDRA-15491)
  * updateCoordinatorWriteLatencyTableMetric can produce misleading metrics 
(CASSANDRA-15569)
  * Added documentation for read repair and an example of full repair 
(CASSANDRA-15485)
  * Make cqlsh and cqlshlib Python 2 & 3 compatible (CASSANDRA-10190)
diff --git a/doc/source/architecture/dynamo.rst 
b/doc/source/architecture/dynamo.rst
index 12c586e..380abc2 100644
--- a/doc/source/architecture/dynamo.rst
+++ b/doc/source/architecture/dynamo.rst
@@ -29,6 +29,8 @@ Failure Detection
 
 .. todo:: todo
 
+.. _token-range:
+
 Token Ring/Ranges
 ^^^^^^^^^^^^^^^^^
 
diff --git a/doc/source/operating/hints.rst b/doc/source/operating/hints.rst
index f79f18a..94ff16f 100644
--- a/doc/source/operating/hints.rst
+++ b/doc/source/operating/hints.rst
@@ -17,6 +17,261 @@
 .. highlight:: none
 
 Hints
------
+=====
 
-.. todo:: todo
+Hinting is a data repair technique applied during write operations. When
+replica nodes are unavailable to accept a mutation, either due to failure or
+more commonly routine maintenance, coordinators attempting to write to those
+replicas store temporary hints on their local filesystem for later application
+to the unavailable replica. Hints are an important way to help reduce the
+duration of data inconsistency. Coordinators replay hints quickly after
+unavailable replica nodes return to the ring. Hints are best effort, however,
+and do not guarantee eventual consistency like :ref:`anti-entropy repair
+<repair>` does.
+
+Hints are useful because of how Apache Cassandra replicates data to provide
+fault tolerance, high availability and durability. Cassandra :ref:`partitions
+data across the cluster <token-range>` using consistent hashing, and then
+replicates keys to multiple nodes along the hash ring. To guarantee
+availability, all replicas of a key can accept mutations without consensus, but
+this means it is possible for some replicas to accept a mutation while others
+do not. When this happens an inconsistency is introduced.
+
+Hints are one of the three ways, in addition to read-repair and
+full/incremental anti-entropy repair, that Cassandra implements the eventual
+consistency guarantee that all updates are eventually received by all replicas.
+Hints, like read-repair, are best effort and not an alternative to performing
+full repair, but they do help reduce the duration of inconsistency between
+replicas in practice.
+
+Hinted Handoff
+--------------
+
+Hinted handoff is the process by which Cassandra applies hints to unavailable
+nodes.
+
+For example, consider a mutation is to be made at ``Consistency Level``
+``LOCAL_QUORUM`` against a keyspace with ``Replication Factor`` of ``3``.
+Normally the client sends the mutation to a single coordinator, who then sends
+the mutation to all three replicas, and when two of the three replicas
+acknowledge the mutation the coordinator responds successfully to the client.
+If a replica node is unavailable, however, the coordinator stores a hint
+locally to the filesystem for later application. New hints will be retained for
+up to ``max_hint_window_in_ms`` of downtime (defaults to ``3 hours``).  If the
+unavailable replica does return to the cluster before the window expires, the
+coordinator applies any pending hinted mutations against the replica to ensure
+that eventual consistency is maintained.
+
+.. figure:: images/hints.svg
+    :alt: Hinted Handoff Example
+
+    Hinted Handoff in Action
+
+* (``t0``): The write is sent by the client, and the coordinator sends it
+  to the three replicas. Unfortunately ``replica_2`` is restarting and cannot
+  receive the mutation.
+* (``t1``): The client receives a quorum acknowledgement from the coordinator.
+  At this point the client believe the write to be durable and visible to reads
+  (which it is).
+* (``t2``): After the write timeout (default ``2s``), the coordinator decides
+  that ``replica_2`` is unavailable and stores a hint to its local disk.
+* (``t3``): Later, when ``replica_2`` starts back up it sends a gossip message
+  to all nodes, including the coordinator.
+* (``t4``): The coordinator replays hints including the missed mutation
+  against ``replica_2``.
+
+If the node does not return in time, the destination replica will be
+permanently out of sync until either read-repair or full/incremental
+anti-entropy repair propagates the mutation.
+
+Application of Hints
+^^^^^^^^^^^^^^^^^^^^
+
+Hints are streamed in bulk, a segment at a time, to the target replica node and
+the target node replays them locally. After the target node has replayed a
+segment it deletes the segment and receives the next segment. This continues
+until all hints are drained.
+
+Storage of Hints on Disk
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Hints are stored in flat files in the coordinator node’s
+``$CASSANDRA_HOME/data/hints`` directory. A hint includes a hint id, the target
+replica node on which the mutation is meant to be stored, the serialized
+mutation (stored as a blob) that couldn't be delivered to the replica node, the
+mutation timestamp, and the Cassandra version used to serialize the mutation.
+By default hints are compressed using ``LZ4Compressor``. Multiple hints are
+appended to the same hints file.
+
+Since hints contain the original unmodified mutation timestamp, hint 
application
+is idempotent and cannot overwrite a future mutation.
+
+Hints for Timed Out Write Requests
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Hints are also stored for write requests that time out. The
+``write_request_timeout_in_ms`` setting in ``cassandra.yaml`` configures the
+timeout for write requests.
+
+::
+
+  write_request_timeout_in_ms: 2000
+
+The coordinator waits for the configured amount of time for write requests to
+complete, at which point it will time out and generate a hint for the timed out
+request. The lowest acceptable value for ``write_request_timeout_in_ms`` is 10 
ms.
+
+
+Configuring Hints
+-----------------
+
+Hints are enabled by default as they are critical for data consistency. The
+``cassandra.yaml`` configuration file provides several settings for configuring
+hints:
+
+Table 1. Settings for Hints
+
++--------------------------------------------+-------------------------------------------+-------------------------------+
+|Setting                                     | Description                     
          |Default Value                  |
++--------------------------------------------+-------------------------------------------+-------------------------------+
+|``hinted_handoff_enabled``                  |Enables/Disables hinted handoffs 
          | ``true``                      |
+|                                            |                                 
          |                               |
+|                                            |                                 
          |                               |
+|                                            |                                 
          |                               |
+|                                            |                                 
          |                               |
++--------------------------------------------+-------------------------------------------+-------------------------------+
+|``hinted_handoff_disabled_datacenters``     |A list of data centers that do 
not perform | ``unset``                     |
+|                                            |hinted handoffs even when 
handoff is       |                               |
+|                                            |otherwise enabled.               
          |                               |
+|                                            |Example:                         
          |                               |
+|                                            |                                 
          |                               |
+|                                            | .. code-block:: yaml            
          |                               |
+|                                            |                                 
          |                               |
+|                                            |     
hinted_handoff_disabled_datacenters:  |                               |
+|                                            |       - DC1                     
          |                               |
+|                                            |       - DC2                     
          |                               |
++--------------------------------------------+-------------------------------------------+-------------------------------+
+|``max_hint_window_in_ms``                   |Defines the maximum amount of 
time (ms)    | ``10800000`` # 3 hours        |
+|                                            |a node shall have hints 
generated after it |                               |
+|                                            |has failed.                      
          |                               |
++--------------------------------------------+-------------------------------------------+-------------------------------+
+|``hinted_handoff_throttle_in_kb``           |Maximum throttle in KBs per 
second, per    |                               |
+|                                            |delivery thread. This will be 
reduced      | ``1024``                      |
+|                                            |proportionally to the number of 
nodes in   |                               |
+|                                            |the cluster.                     
          |                               |
+|                                            |(If there are two nodes in the 
cluster,    |                               |
+|                                            |each delivery thread will use 
the maximum  |                               |
+|                                            |rate; if there are 3, each will 
throttle   |                               |
+|                                            |to half of the maximum,since it 
is expected|                               |
+|                                            |for two nodes to be delivering 
hints       |                               |
+|                                            |simultaneously.)                 
          |                               |
++--------------------------------------------+-------------------------------------------+-------------------------------+
+|``max_hints_delivery_threads``              |Number of threads with which to 
deliver    | ``2``                         |
+|                                            |hints; Consider increasing this 
number when|                               |
+|                                            |you have multi-dc deployments, 
since       |                               |
+|                                            |cross-dc handoff tends to be 
slower        |                               |
++--------------------------------------------+-------------------------------------------+-------------------------------+
+|``hints_directory``                         |Directory where Cassandra stores 
hints.    |``$CASSANDRA_HOME/data/hints`` |
+|                                            |                                 
          |                               |
++--------------------------------------------+-------------------------------------------+-------------------------------+
+|``hints_flush_period_in_ms``                |How often hints should be 
flushed from the | ``10000``                     |
+|                                            |internal buffers to disk. Will 
*not*       |                               |
+|                                            |trigger fsync.                   
          |                               |
++--------------------------------------------+-------------------------------------------+-------------------------------+
+|``max_hints_file_size_in_mb``               |Maximum size for a single hints 
file, in   | ``128``                       |
+|                                            |megabytes.                       
          |                               |
++--------------------------------------------+-------------------------------------------+-------------------------------+
+|``hints_compression``                       |Compression to apply to the hint 
files.    | ``LZ4Compressor``             |
+|                                            |If omitted, hints files will be 
written    |                               |
+|                                            |uncompressed. LZ4, Snappy, and 
Deflate     |                               |
+|                                            |compressors are supported.       
          |                               |
++--------------------------------------------+-------------------------------------------+-------------------------------+
+
+Configuring Hints at Runtime with ``nodetool``
+----------------------------------------------
+
+``nodetool`` provides several commands for configuring hints or getting hints
+related information. The nodetool commands override the corresponding
+settings if any in ``cassandra.yaml`` for the node running the command.
+
+Table 2. Nodetool Commands for Hints
+
++--------------------------------+-------------------------------------------+
+|Command                         | Description                               |
++--------------------------------+-------------------------------------------+
+|``nodetool disablehandoff``     |Disables storing and delivering hints      |
++--------------------------------+-------------------------------------------+
+|``nodetool disablehintsfordc``  |Disables storing and delivering hints to a |
+|                                |data center                                |
++--------------------------------+-------------------------------------------+
+|``nodetool enablehandoff``      |Re-enables future hints storing and        |
+|                                |delivery on the current node               |
++--------------------------------+-------------------------------------------+
+|``nodetool enablehintsfordc``   |Enables hints for a data center that was   |
+|                                |previously disabled                        |
++--------------------------------+-------------------------------------------+
+|``nodetool getmaxhintwindow``   |Prints the max hint window in ms. New in   |
+|                                |Cassandra 4.0.                             |
++--------------------------------+-------------------------------------------+
+|``nodetool handoffwindow``      |Prints current hinted handoff window       |
++--------------------------------+-------------------------------------------+
+|``nodetool pausehandoff``       |Pauses hints delivery process              |
++--------------------------------+-------------------------------------------+
+|``nodetool resumehandoff``      |Resumes hints delivery process             |
++--------------------------------+-------------------------------------------+
+|``nodetool                      |Sets hinted handoff throttle in kb         |
+|sethintedhandoffthrottlekb``    |per second, per delivery thread            |
++--------------------------------+-------------------------------------------+
+|``nodetool setmaxhintwindow``   |Sets the specified max hint window in ms   |
++--------------------------------+-------------------------------------------+
+|``nodetool statushandoff``      |Status of storing future hints on the      |
+|                                |current node                               |
++--------------------------------+-------------------------------------------+
+|``nodetool truncatehints``      |Truncates all hints on the local node, or  |
+|                                |truncates hints for the endpoint(s)        |
+|                                |specified.                                 |
++--------------------------------+-------------------------------------------+
+
+Make Hints Play Faster at Runtime
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The default of ``1024 kbps`` handoff throttle is conservative for most modern
+networks, and it is entirely possible that in a simple node restart you may
+accumulate many gigabytes hints that may take hours to play back. For example 
if
+you are ingesting ``100 Mbps`` of data per node, a single 10 minute long
+restart will create ``10 minutes * (100 megabit / second) ~= 7 GiB`` of data
+which at ``(1024 KiB / second)`` would take ``7.5 GiB / (1024 KiB / second) =
+2.03 hours`` to play back. The exact math depends on the load balancing 
strategy
+(round robin is better than token aware), number of tokens per node (more
+tokens is better than fewer), and naturally the cluster's write rate, but
+regardless you may find yourself wanting to increase this throttle at runtime.
+
+If you find yourself in such a situation, you may consider raising
+the ``hinted_handoff_throttle`` dynamically via the
+``nodetool sethintedhandoffthrottlekb`` command.
+
+Allow a Node to be Down Longer at Runtime
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Sometimes a node may be down for more than the normal 
``max_hint_window_in_ms``,
+(default of three hours), but the hardware and data itself will still be
+accessible.  In such a case you may consider raising the
+``max_hint_window_in_ms`` dynamically via the ``nodetool setmaxhintwindow``
+command added in Cassandra 4.0 (`CASSANDRA-11720 
<https://issues.apache.org/jira/browse/CASSANDRA-11720>`_).
+This will instruct Cassandra to continue holding hints for the down
+endpoint for a longer amount of time.
+
+This command should be applied on all nodes in the cluster that may be holding
+hints. If needed, the setting can be applied permanently by setting the
+``max_hint_window_in_ms`` setting in ``cassandra.yaml`` followed by a rolling
+restart.
+
+Monitoring Hint Delivery
+------------------------
+
+Cassandra 4.0 adds histograms available to understand how long it takes to 
deliver
+hints which is useful for operators to better identify problems 
(`CASSANDRA-13234
+<https://issues.apache.org/jira/browse/CASSANDRA-13234>`_).
+
+There are also metrics available for tracking :ref:`Hinted Handoff 
<handoff-metrics>`
+and :ref:`Hints Service <hintsservice-metrics>` metrics.
diff --git a/doc/source/operating/images/hints.svg 
b/doc/source/operating/images/hints.svg
new file mode 100644
index 0000000..5e952e7
--- /dev/null
+++ b/doc/source/operating/images/hints.svg
@@ -0,0 +1,9 @@
+<svg xmlns="http://www.w3.org/2000/svg"; width="661.2000122070312" 
height="422.26666259765625" style="
+        width:661.2000122070312px;
+        height:422.26666259765625px;
+        background: transparent;
+        fill: none;
+">
+        <svg xmlns="http://www.w3.org/2000/svg"; 
class="role-diagram-draw-area"><g class="shapes-region" style="stroke: black; 
fill: none;"><g class="composite-shape"><path class="real" d=" M40,60 C40,43.43 
53.43,30 70,30 C86.57,30 100,43.43 100,60 C100,76.57 86.57,90 70,90 C53.43,90 
40,76.57 40,60 Z" style="stroke-width: 1px; stroke: rgb(0, 0, 0); fill: 
none;"/></g><g class="arrow-line"><path class="connection real" 
stroke-dasharray="" d="  M70,300 L70,387" style="stroke: rgb(0, 0, 0); s [...]
+        <svg xmlns="http://www.w3.org/2000/svg"; width="660" 
height="421.066650390625" 
style="width:660px;height:421.066650390625px;font-family:Asana-Math, 
Asana;background:transparent;"><g><g><g 
style="transform:matrix(1,0,0,1,47.266693115234375,65.81666564941406);"><path 
d="M342 330L365 330C373 395 380 432 389 458C365 473 330 482 293 482C248 483 175 
463 118 400C64 352 25 241 25 136C25 40 67 -11 147 -11C201 -11 249 9 304 54L354 
95L346 115L331 105C259 57 221 40 186 40C130 40 101 80 101 15 [...]
+</svg>
diff --git a/doc/source/operating/metrics.rst b/doc/source/operating/metrics.rst
index e87bd5a..fc37440 100644
--- a/doc/source/operating/metrics.rst
+++ b/doc/source/operating/metrics.rst
@@ -534,6 +534,8 @@ TotalHints                 Counter        Number of hint 
messages written to thi
 TotalHintsInProgress       Counter        Number of hints attemping to be sent 
currently.
 ========================== ============== ===========
 
+.. _handoff-metrics:
+
 HintedHandoff Metrics
 ^^^^^^^^^^^^^^^^^^^^^
 
@@ -556,6 +558,8 @@ Hints_created-<PeerIP>       Counter        Number of hints 
on disk for this pee
 Hints_not_stored-<PeerIP>    Counter        Number of hints not stored for 
this peer, due to being down past the configured hint window.
 =========================== ============== ===========
 
+.. _hintsservice-metrics:
+
 HintsService Metrics
 ^^^^^^^^^^^^^^^^^^^^^
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to