In my talk at Percona Live 2021, “Creating Chaos in Databases”, I discussed how creating a controlled interruption in available resources (I used primary pod and network interruptions) allows us to test the stability of a database, and in our case, Percona XtraDB Cluster.
I also mentioned in the talk that my testing led to diagnosing a few unpleasant bugs, namely:
- PXC-3437: Node fails to join in the endless loop
- PXC-3580: Aggressive network outages on one node makes the whole cluster unusable
- …