Hi, Im trying to copy a 20 GB CSV file into a 3 node fresh cassandra cluster with 32 GB memory each, sufficient disk, RF-1 and durable write false. The machine im feeding into is external to the cluster and shares 1GBps line and has 16 GB RAM. (We have chosen this setup to possibly reduce CPU and IO usage).
Im trying to use COPY command to feed in data. It kicks off well, launches a set of processes, does about 50,000 rows per second. But I can see that the parent process starts aggregating memory almost of the size of data processed and after a point the processes just hang. The parent process was consuming 95% system memory when it had processed around 60% data. I had earlier tried to feed in data from multiple files (Less than 4GB each) and it was working as expected. Is it a valid scenario? Regards, Bhuvan