Cassandra Writes Slowing Down Cluster
We had a problem a few weeks ago at work where we had a Cassandra cluster slow down pretty dramatically in response to a large number of writes. The primary cause was that we had set the
concurrent_compactors in the
cassandra.yml file to be too high. Notably:
- The spike in requests caused Cassandra to flush more SSTables
- More SSTables triggered more compaction
- Too much compaction had all the Cassandra threads dedicated to handling compaction and not actually handling requests.
In the future, this can be discovered with:
- JMX metrics on the
Pending Compactorson the Table Metrics: https://cassandra.apache.org/doc/latest/operating/metrics.html#monitoring-metrics
We took a little bit to figure this out, although a coworker pointed out that
iostat probably could've given us some more insight as well. We also found out that the relevant line to grep for is
StatusLogger.java:65, which is the generic dump for in Cassandra for the thread pool status: https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/StatusLogger.java#L65.