Cassandra Issue with Tombstone

1. Cassandra is quicker than postgre and have lower change to lose data. Cassandra doesn't have foreign keys, locking mechanism and etcs, so that it's quicker on writes.

2. Everything in cassandra is a write. Insert/update/delete is also write.

3. Setting a column to null/ deleting a column will create a tomestone; Deleting a row/primary key/partitio will create a single row tomestone

4. Could adjust tombstone_warn_threshold and tombstone_failure_threshold in cassandra.yaml.

5. Could adjust gc_grace_seconds when creating table

6. Hitting tombstone limit only happens per query.

Related Attributes

Delete will create tombstones

tombstone_warn_threshold: 1000 (default), could be found in cassandra.yaml
tombstone_failure_threshold: 100000 (default), could be found in cassandra.yaml
tombstone_compaction_interval: table attribute
min_compaction_threshold: table attribute #Compaction will only be eligible after min_compaction_threshold SSTables exist, by default it’s 4.
gc_grace_seconds: table attribute
snapshot_before_compaction: false

Check table attributes here http://docs.datastax.com/en/cassandra/2.1/cassandra/reference/referenceTableAttributes.html

Could consider using DateTieredCompactionStrategy instead of the default SizeTieredCompactionStrategy.

Cassandra MBean

Use Jconsole to remotely connect to:

hostname:7199

e.g. localhost:7199

Check/change the TombstoneFailureThreshold attribute inside StorageService MBean.

Force a flush and compaction

sudo nodetool -h localhost -p 7199 -u OC_APP_RAINBOWDBA -pw a3c224d4b89192d2ea3ea943dd7e9648 flush rainbowdba undeliveredmessage

sudo nodetool -h localhost -p 7199 -u OC_APP_RAINBOWDBA -pw a3c224d4b89192d2ea3ea943dd7e9648 compact rainbowdba undeliveredmessage

Deleted rows will only disappear when gc_grace_seconds time passed and a flush and compaction has been forced

Truncating Table

Truncating a table is an immediate operation and won’t leave any tomestones.

Don’t insert Null into columns

Inserting a null value to the column will leave a cell tomestone. Deleting a partition/row will also create a single row tombstone.

Deleting a partition will create a partition tomestone and override the existing cell tomestones. This only happens in memory table not on the disk. Not sure whether creating a partition tomestone will cause a compaction of the cell tomestones on disk.

Using TTL

insert into undeliveredmessage("id", "message","type") values('1','message','RAVEN') using ttl 5;

This query will result in 3 tomestone cells and one row tombstone.

Cassandra partition size limitation

In Cassandra, the maximum number of cells (rows x columns) in a single partition is 2 billion.

Additionally, a single column value may not be larger than 2GB. Partitions greater than 100Mb can cause significant pressure on the heap.

Performance Test

Test script TestCassandraPerformance.java could be found in

Cassandra version: 2.2.3, cqlsh 5.0.1

1. TombstoneFailureThreshold = 500

Seems persist 102000 rows and then delete them won’t hit the limit of the tomestone.

2. TombstoneFailureThreshold = 1

insert into undeliveredmessage("id","message","sent","type") values('3', 'message3', True, null);

and then select * from undeliveredmessage is fine

2. TombstoneFailureThreshold = 1

insert into undeliveredmessage("id","message","sent","type") values('3', 'message3', null, null);

and then select * from undeliveredmessage will hit the tomestone limit

deleted rows number	existing rows number	locally recovery time	vector 2 recovery time
	150_000 * 9	Operation Timed Out	Operation Timed Out
	150_000 * 5	9_072 ms	10_152 ms
	150_000 * 3	5_679 ms	7_957 ms
	150_000 * 2	3_025 ms	5_218 ms
	150_000	1_326 ms	1_879 ms
150_000		158 ms	333 ms
150_000 *2		562 ms	1_963 ms
150_000 *3		2_223 ms	3_833 ms
150_000 *5		3_476 ms	9_726 ms
150_000 *10		Operation Timed Out	Operation Timed Out
150_000	150_000	1_321 ms	3_735 ms
150_000 *2	150_000	1_893 ms	4_939 ms

Note that we will hit timeout issue when having 150_000 *10 deleted rows in the table.

Hitting tombstone limit

For Dash you should see

com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1:9042 (com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)))

at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:223)

For query in command line, you should see something like:

Traceback (most recent call last):

File "/usr/bin/cqlsh.py", line 1172, in perform_simple_statement

rows = future.result(self.session.default_timeout)

File "/usr/share/cassandra/lib/cassandra-driver-internal-only-2.7.2.zip/cassandra-driver-2.7.2/cassandra/cluster.py", line 3347, in result

raise self._final_exception

ReadFailure: code=1300 [Replica(s) failed to execute read] message="Operation failed - received 0 responses and 1 failures" info={'failures': 1, 'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}