Hi all, My partner and I currently using cassandra cluster to run TPC-C. We first use 2 ec2 nodes to load 20 warehouses. One(client node) has 8 cores, the other(worker node) has 4 cores. During the loading time, either the client node or the worker node will "down"(cannot be detected) randomly and then "up" again in a short time. If the two nodes both down, we failed in loading. If only one of them down, we can continue to load data.
The problem is if we use multiple threads(we write multiprocess code), say 4 clients threads, some of them might be stop at the point one of the nodes first down, and the dead threads will never come back.... This will not only enlarge our loading time, but also effect the amount of data we can load. So we need to figure out why the nodes continue to be up and down and fix this problem. Thanks for any help! Best, Xiaowei