Let the Chaos begin!

Have you ever wondered what happens to your system if a critical service crashes in production?

Will it recover gracefully or end up crashing?

Does your “high availability” really work under stress, or only in unit tests?

How much of resilience is truly tested, versus just theorized in design documents?

These are the kinds of uncomfortable, but essential, questions that every responsible software team should ask and answer. The Observability Team is excited to announce that we have exactly what you need to get the answers to all these questions.

Meet the Canonical Chaos Engineering platform!

The Canonical Chaos Engineering platform is an opinionated set of tools facilitating environment for conducting, observing and analyzing output of Chaos engineering tests. The solution leverages Juju and a set of Canonical Operators to provide you with a smooth and frictionless experience of deploying and managing the platform throughout its entire lifecycle.

At the time of releasing this post, the Canonical Chaos Engineering platform covers the Litmus Control Plane and allows you to conduct Chaos Experiments in any Kubernetes environment.

Use Cases

Litmus offers a broad variety of Chaos Experiments which will help you build better software. Most common use cases include:

  • Testing charm resilience under different kinds of disruptions (e.g. K8s node level, network level, POD level, high load)
  • Validating HA configurations
  • Scheduled reliability experiments

To get more information about Litmus, visit the project’s official documentation.

Try It Out

If you’re interested in trying out the Chaos Engineering experience start right away by following our tutorial. Getting started is as easy as issuing a set of terminal commands.

If you have any questions don’t hesitate to reach out to use on Mattermost or Matrix. We’ll be more than happy to chat.

We’d love to get your feedback; both good and bad.

Let’s break some stuff together! … but in a safe and controlled environment.

4 Likes