How to Remove a Cassandra Cluster Node
We end up in ‘dead’ cassandra node situation if we add a node to the cluster and then delete this VM (for example, terminate during re-building second region).
To recover from this problem, we need to do following:
1. Remove the ‘dead’ node. We do not have connection to cassandra on the node being deleted, so we can NOT use ‘decomission’ command. We need to use ‘removetoken’ instead.
First, listing current nodes:
root@slbe-22-19-p lib]# java -cp libthrift-0.7.0.jar:cassandra-thrift-1.1.6.jar:commons-cli-1.1.jar:cassandra-all-1.1.6.jar org.apache.cassandra.tools.NodeCmd -h localhost -p 9192 ring
Note: Ownership information does not include topology, please specify a keyspace.
| Address | DC | Rack | Status | State | Load | Owns | Token |
| Xx | xx | xx | xx | xx | xx | 83794128212295607812427599225465974454 | |
| 10.52.86.38 | usw1 | RAC1 | Down | Normal | ? | 59.09% | 14189180876230654535300830053153397235 |
| 10.51.29.29 | usw1 | RAC2 | Up | Normal | 547.84 KB | 3.45% | 20055304209618966354482420882580629466 |
| 10.51.28.170 | usw1 | RAC1 | Up | Normal | 498.49 KB | 37.46% | 83794128212295607812427599225465974454 |
We see that node 10.52.86.38 is Down. Using its token for the next command:
[root@slbe-22-19-p lib]# java -cp libthrift-0.7.0.jar:cassandra-thrift-1.1.6.jar:commons-cli-1.1.jar:cassandra-all-1.1.6.jar org.apache.cassandra.tools.NodeCmd -h localhost -p 9192 removetoken 14189180876230654535300830053153397235
This command takes a while to execute.
2. After that, I had to drop the ‘regional’ keyspace for apac (this keyspace has 0 nodes, so, data is gone anyway). So, this is sort of ‘cleanup’ process. If we have all nodes in a keyspace gone, we loose data in this keyspace.
3. After that, nodes in apac can be started (or, nodes can be redeployed) Me must consider, that deleted keyspace is created ONLY by conf_sync node and ONLY in case all nodes are started. ‘Standard’ deployment workflow assumes, that primary node is started before backup, and in this case keyspace sipfs_ap never gets created. To resolve this, all nodes for keyspace sipfs_ap must be stopped and then either started nearly at the same, or conf_sync node restarted.
