Re: Reads not returning data after adding node

Bowen Song via user Mon, 03 Apr 2023 08:42:44 -0700

That's correct. nodetool removenode is strongly preferred when your nodeis already down. If the node is still functional, use nodetooldecommission on the node instead.


On 03/04/2023 16:32, Jeff Jirsa wrote:

FWIW, `nodetool decommission` is strongly preferred. `nodetoolremovenode` is designed to be run when a host is offline. Onlydecommission is guaranteed to maintain consistency / correctness, andremovemode probably streams a lot more data around than decommission.

On Mon, Apr 3, 2023 at 6:47 AM Bowen Song via user<user@cassandra.apache.org> wrote:


    Use nodetool removenode is strongly preferred in most
    circumstances, and only resort to assassinate if you do not care
    about data consistency or you know there won't be any consistency
    issue (e.g. no new writes and did not run nodetool cleanup).

    Since the size of data on the new node is small, nodetool
    removenode should finish fairly quickly and bring your cluster back.

    Next time when you are doing something like this again, please
    test it out on a non-production environment, make sure everything
    works as expected before moving onto the production.


    On 03/04/2023 06:28, David Tinker wrote:

    Should I use assassinate or removenode? Given that there is some
    data on the node. Or will that be found on the other nodes? Sorry
    for all the questions but I really don't want to mess up.

    On Mon, Apr 3, 2023 at 7:21 AM Carlos Diaz <crdiaz...@gmail.com>
    wrote:

        That's what nodetool assassinte will do.

        On Sun, Apr 2, 2023 at 10:19 PM David Tinker
        <david.tin...@gmail.com> wrote:

            Is it possible for me to remove the node from the cluster
            i.e. to undo this mess and get the cluster operating again?

            On Mon, Apr 3, 2023 at 7:13 AM Carlos Diaz
            <crdiaz...@gmail.com> wrote:

                You can leave it in the seed list of the other nodes,
                just make sure it's not included in this node's seed
                list.  However, if you do decide to fix the issue
                with the racks first assassinate this node (nodetool
                assassinate <ip>), and update the rack name before
                you restart.

                On Sun, Apr 2, 2023 at 10:06 PM David Tinker
                <david.tin...@gmail.com> wrote:

                    It is also in the seeds list for the other nodes.
                    Should I remove it from those, restart them one
                    at a time, then restart it?

                    /etc/cassandra # grep -i bootstrap *
                    doesn't show anything so I don't think I have
                    auto_bootstrap false.

                    Thanks very much for the help.


                    On Mon, Apr 3, 2023 at 7:01 AM Carlos Diaz
                    <crdiaz...@gmail.com> wrote:

                        Just remove it from the seed list in the
                        cassandra.yaml file and restart the node. 
                        Make sure that auto_bootstrap is set to true
                        first though.

                        On Sun, Apr 2, 2023 at 9:59 PM David Tinker
                        <david.tin...@gmail.com> wrote:

                            So likely because I made it a seed node
                            when I added it to the cluster it didn't
                            do the bootstrap process. How can I
                            recover this?

                            On Mon, Apr 3, 2023 at 6:41 AM David
                            Tinker <david.tin...@gmail.com> wrote:

                                Yes replication factor is 3.

                                I ran nodetool repair -pr on all the
                                nodes (one at a time) and am still
                                having issues getting data back from
                                queries.

                                I did make the new node a seed node.

                                Re "rack4": I assumed that was just
                                an indication as to the physical
                                location of the server for
                                redundancy. This one is separate from
                                the others so I used rack4.

                                On Mon, Apr 3, 2023 at 6:30 AM Carlos
                                Diaz <crdiaz...@gmail.com> wrote:

                                    I'm assuming that your
                                    replication factor is 3.  If
                                    that's the case, did you
                                    intentionally put this node in
                                    rack 4?  Typically, you want to
                                    add nodes in multiples of your
                                    replication factor in order to
                                    keep the "racks" balanced.  In
                                    other words, this node should
                                    have been added to rack 1, 2 or 3.

                                    Having said that, you should be
                                    able to easily fix your problem
                                    by running a nodetool repair -pr
                                    on the new node.

                                    On Sun, Apr 2, 2023 at 8:16 PM
                                    David Tinker
                                    <david.tin...@gmail.com> wrote:

                                        Hi All

                                        I recently added a node to my
                                        3 node Cassandra 4.0.5
                                        cluster and now many reads
                                        are not returning rows! What
                                        do I need to do to fix this?
                                        There weren't any errors in
                                        the logs or other problems
                                        that I could see. I expected
                                        the cluster to balance itself
                                        but this hasn't happened
                                        (yet?). The nodes are similar
                                        so I have num_tokens=256 for
                                        each. I am using the
                                        Murmur3Partitioner.

                                        # nodetool status
                                        Datacenter: dc1
                                        ===============
                                        Status=Up/Down
                                        |/
                                        State=Normal/Leaving/Joining/Moving
                                        --  Address        Load    
                                        Tokens  Owns (effective)
                                         Host ID           Rack
                                        UN  xxx.xxx.xxx.105  2.65 TiB
                                        256     72.9%
                                        afd02287-3f88-4c6f-8b27-06f7a8192402
                                         rack3
                                        UN  xxx.xxx.xxx.253  2.6 TiB
                                         256     73.9%
                                        e1af72be-e5df-4c6b-a124-c7bc48c6602a
                                         rack2
                                        UN  xxx.xxx.xxx.24   93.82
                                        KiB  256     80.0%
                                        c4e8b4a0-f014-45e6-afb4-648aad4f8500
                                         rack4
                                        UN  xxx.xxx.xxx.107  2.65 TiB
                                        256     73.2%
                                        ab72f017-be96-41d2-9bef-a551dec2c7b5
                                         rack1

                                        # nodetool netstats
                                        Mode: NORMAL
                                        Not sending any streams.
                                        Read Repair Statistics:
                                        Attempted: 0
                                        Mismatch (Blocking): 0
                                        Mismatch (Background): 0
                                        Pool Name  Active Pending
                                         Completed Dropped
                                        Large messages    n/a 0  71754 0
                                        Small messages    n/a 0
                                         8398184  14
                                        Gossip messages           n/a
                                                0    1303634     0

                                        # nodetool ring
                                        Datacenter: dc1
                                        ==========
                                        Address         Rack    
                                         Status State   Load        
                                         Owns  Token
                                         9189523899826545641
                                        xxx.xxx.xxx.24        rack4  
                                          Up Normal  93.82 KiB 79.95%
                                         -9194674091837769168
                                        xxx.xxx.xxx.107       rack1  
                                            Up     Normal  2.65 TiB  
                                             73.25%  -9168781258594813088
                                        xxx.xxx.xxx.253       rack2  
                                            Up     Normal  2.6 TiB  
                                              73.92%
                                         -9163037340977721917
                                        xxx.xxx.xxx.105       rack3  
                                            Up     Normal  2.65 TiB  
                                             72.88%  -9148860739730046229
                                        xxx.xxx.xxx.107       rack1  
                                            Up     Normal  2.65 TiB  
                                             73.25%  -9125240034139323535
                                        xxx.xxx.xxx.253       rack2  
                                            Up     Normal  2.6 TiB  
                                              73.92%
                                         -9112518853051755414
                                        xxx.xxx.xxx.105       rack3  
                                            Up     Normal  2.65 TiB  
                                             72.88%  -9100516173422432134
                                        ...

                                        This is causing a serious
                                        production issue. Please help
                                        if you can.

                                        Thanks
                                        David

Re: Reads not returning data after adding node

Reply via email to