Re: cqlsh COPY ... TO ... doesn't work if one node down

@Nandan@ Sun, 01 Jul 2018 18:18:18 -0700

CQL Copy command will not work in case if you are trying to copy from all
NODES because COPY command will check all N nodes UP and RUNNING Status.
If you want to complete then you have 2 options:-
1) Remove DOWN NODE from COPY command
2) Make it UP and NORMAL  status.




On Mon, Jul 2, 2018 at 9:15 AM, Anup Shirolkar <
anup.shirol...@instaclustr.com> wrote:

> Hi,
>
> The error shows that, the cqlsh connection with down node is failed.
> So, you should debug why it happened.
>
> Although, you have mentioned other node in cqlsh command '10.0.0.154'
> my guess is, the down node was present in connection pool, hence it was
> attempted for connection.
>
> Ideally the availability of data should not be hampered due
> to unavailability of one replica out of 5.
> Also the stack trace is about 'cqlsh' connection error.
>
> I think once you get your connection sorted, the COPY should work as usual.
>
> Regards,
> Anup
>
>
> On 30 June 2018 at 15:05, Dmitry Simonov <dimmobor...@gmail.com> wrote:
>
>> Hello!
>>
>> I have cassandra cluster with 5 nodes.
>> There is a (relatively small) keyspace X with RF5.
>> One node goes down.
>>
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address      Load       Tokens       Owns (effective)  Host
>> ID                               Rack
>> UN  10.0.0.82   253.64 MB  256          100.0%
>> 839bef9d-79af-422c-a21f-33bdcf4493c1  rack1
>> UN  10.0.0.154  255.92 MB  256          100.0%
>> ce23f3a7-67d2-47c0-9ece-7a5dd67c4105  rack1
>> UN  10.0.0.76   461.26 MB  256          100.0%
>> c8e18603-0ede-43f0-b713-3ff47ad92323  rack1
>> UN  10.0.0.94   575.78 MB  256          100.0%
>> 9a324dbc-5ae1-4788-80e4-d86dcaae5a4c  rack1
>> DN  10.0.0.47   ?          256          100.0%
>> 7b628ca2-4e47-457a-ba42-5191f7e5374b  rack1
>>
>> I try to export some data using COPY TO, but it fails after long retries.
>> Why does it fail?
>> How can I make a copy?
>> There must be 4 copies of each row on other (alive) replicas.
>>
>> cqlsh 10.0.0.154 -e "COPY X.Y TO 'backup/X.Y' WITH NUMPROCESSES=1"
>>
>> Using 1 child processes
>>
>> Starting copy of X.Y with columns [key, column1, value].
>> 2018-06-29 19:12:23,661 Failed to create connection pool for new host
>> 10.0.0.47:
>> Traceback (most recent call last):
>>   File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/cluster.py",
>> line 2476, in run_add_or_renew_pool
>>     new_pool = HostConnection(host, distance, self)
>>   File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/pool.py",
>> line 332, in __init__
>>     self._connection = session.cluster.connection_factory(host.address)
>>   File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/cluster.py",
>> line 1205, in connection_factory
>>     return self.connection_class.factory(address, self.connect_timeout,
>> *args, **kwargs)
>>   File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/connection.py",
>> line 332, in factory
>>     conn = cls(host, *args, **kwargs)
>>   File 
>> "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/io/asyncorereactor.py",
>> line 344, in __init__
>>     self._connect_socket()
>>   File "/usr/lib/foobar/lib/python3.5/site-packages/cassandra/connection.py",
>> line 371, in _connect_socket
>>     raise socket.error(sockerr.errno, "Tried connecting to %s. Last
>> error: %s" % ([a[4] for a in addresses], sockerr.strerror or sockerr))
>> OSError: [Errno None] Tried connecting to [('10.0.0.47', 9042)]. Last
>> error: timed out
>> 2018-06-29 19:12:23,665 Host 10.0.0.47 has been marked down
>> 2018-06-29 19:12:29,674 Error attempting to reconnect to 10.0.0.47,
>> scheduling retry in 2.0 seconds: [Errno None] Tried connecting to
>> [('10.0.0.47', 9042)]. Last error: timed out
>> 2018-06-29 19:12:36,684 Error attempting to reconnect to 10.0.0.47,
>> scheduling retry in 4.0 seconds: [Errno None] Tried connecting to
>> [('10.0.0.47', 9042)]. Last error: timed out
>> 2018-06-29 19:12:45,696 Error attempting to reconnect to 10.0.0.47,
>> scheduling retry in 8.0 seconds: [Errno None] Tried connecting to
>> [('10.0.0.47', 9042)]. Last error: timed out
>> 2018-06-29 19:12:58,716 Error attempting to reconnect to 10.0.0.47,
>> scheduling retry in 16.0 seconds: [Errno None] Tried connecting to
>> [('10.0.0.47', 9042)]. Last error: timed out
>> 2018-06-29 19:13:19,756 Error attempting to reconnect to 10.0.0.47,
>> scheduling retry in 32.0 seconds: [Errno None] Tried connecting to
>> [('10.0.0.47', 9042)]. Last error: timed out
>> 2018-06-29 19:13:56,834 Error attempting to reconnect to 10.0.0.47,
>> scheduling retry in 64.0 seconds: [Errno None] Tried connecting to
>> [('10.0.0.47', 9042)]. Last error: timed out
>> 2018-06-29 19:15:05,887 Error attempting to reconnect to 10.0.0.47,
>> scheduling retry in 128.0 seconds: [Errno None] Tried connecting to
>> [('10.0.0.47', 9042)]. Last error: timed out
>> 2018-06-29 19:17:18,982 Error attempting to reconnect to 10.0.0.47,
>> scheduling retry in 256.0 seconds: [Errno None] Tried connecting to
>> [('10.0.0.47', 9042)]. Last error: timed out
>> 2018-06-29 19:21:40,064 Error attempting to reconnect to 10.0.0.47,
>> scheduling retry in 512.0 seconds: [Errno None] Tried connecting to
>> [('10.0.0.47', 9042)]. Last error: timed out
>> <stdin>:1:(4, 'Interrupted system call')
>> IOError:
>> IOError:
>> IOError:
>> IOError:
>> IOError:
>>
>>
>> --
>> Best Regards,
>> Dmitry Simonov
>>
>
>
>
> --
>
> Anup Shirolkar
>
> Consultant
>
> +61 420 602 338
>
> <https://www.instaclustr.com/solutions/managed-apache-kafka/>
>
> <https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
>    <https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>.
>

Re: cqlsh COPY ... TO ... doesn't work if one node down

Reply via email to