Charmed MySQL K8s - Async Replication

WARNING: it is an internal article. Do NOT use it in production! Contact Canonical Data Platform team if you are interested in the topic.

Creating a MySQL Cluster Set

1 - Deploy two MySQL clusters, named Rome and Lisbon:

juju add-model az1  # db cluster 1, location: Rome
juju add-model az2  # db cluster 1, location: Lisbon
juju add-model app  # db client application, location: somewhere

Rome:

juju switch az1
juju deploy mysql-k8s db1 --trust --channel=8.0/edge/arepl --config profile=testing --config cluster-name=rome --base ubuntu@22.04

Lisbon:

juju switch az2
juju deploy mysql-k8s db2 --trust --channel=8.0/edge/arepl --config profile=testing --config cluster-name=lisbon --base ubuntu@22.04

Client:

juju switch app
juju deploy mysql-test-app
juju deploy mysql-router-k8s --trust --channel 8.0/edge

2 - Create and consume an offer

It’s required to define the roles that the clusters will play in the cluster-set, i.e. which will be the primary and the standby. For this setup, the application db1 will be setup as the primary cluster and db2 will be the replica/standby cluster.

NOTE: The side of the relation is used for the setup phase only. In an event of a planned switchover or a failover, a standby cluster can be promoted to active independently of the relation side.

To create a cluster set from this two clusters, we need to create a relation, which uses the async_replication interface, through the async-primaryand async-replica relation names. But first it’s necessary to create and consume an offer:

juju switch az1
juju offer db1:async-primary async-primary
juju offer db1:database db1database

juju switch az2
juju offer db2:async-primary async-primary
juju offer db2:database db2database

juju switch app
juju consume az1.db1database
juju consume az2.db2database

juju consume az1.async-primary -m az2
juju consume az2.async-primary -m az1

3 - Relate the applications

juju relate -m app mysql-test-app mysql-router-k8s
juju relate -m app mysql-router-k8s db1database

juju relate -m az2 async-primary db2:async-replica

And wait until the process is finished. Behind the scenes the replica cluster will be setup as a replica, cloning data from the primary cluster and rejoining all units to it. The mysql-test-app will automatically write to db1 on az1 side and data will be automatically propogated to db2 on az2 side.

4 - Checking cluster-set status

Run the get-cluster-status with the cluster-set=Trueflag:

juju run -m az1 db1/0 get-cluster-status cluster-set=True

Results:

status:
  clusters:
    lisbon:
      clusterrole: replica
      clustersetreplicationstatus: ok
      globalstatus: ok
    rome:
      clusterrole: primary
      globalstatus: ok
      primary: db1-0.db1-endpoints.az1.svc.cluster.local:3306
  domainname: cluster-set-119185404c15ba547eb5f0750a5c34b5
  globalprimaryinstance: db1-0.db1-endpoints.az1.svc.cluster.local:3306
  primarycluster: rome
  status: healthy
  statustext: all clusters available.
success: "True"

5 - Scaling clusters

The two clusters works independently, this means that it’s possible to independently scaling in/out each cluster without much hassle, e.g.:

juju scale-application -m az1 db1 3

juju scale-application -m az2 db2 3

NOTE: resource usage configurations are also independent.

Safe removing a cluster from the cluster set

For removing a given cluster from the cluster set, you need be sure that the given cluster is not the primary cluster. Case it is, there’s a provided action that can execute a safe switchover:

1 - (optional) Safely promote to active

juju run -m az2 db2/leader promote-standby-cluster cluster-set-name=<cluster-set-119185404c15ba547eb5f0750a5c34b5>

It’s required to provide the cluster-set-name option as a foolproof method.

2 - Remove the relation

When removing a async_replication relation, the primary cluster will keep working as is, while the replica cluster will be dissolved with all it’s units in standalone read-only mode.

The side of the relation (primary/replica) does not matter, since only the current role of the cluster will be considered.

juju remove-relation -m az2 async-primary db2

3 - Recovering blocked cluster

To recover a replica cluster after the relation is removed, there’s a provided action:

juju run -m az2 db2/leader recreate-cluster

The action will recover the cluster as a standalone cluster and with the data from the cluster-set.

Failover

When a failover of the primary is required, it’s necessary to manually set cluster roles.

1 - Promote to active

To promote the standby cluster to active/primary, it’s necessary to run the action with the force flag set.

juju run -m az2 db2/leader promote-standby-cluster cluster-set-name=<my-cluster-set> force=True

The force will cause the old primary to be invalidated. It’s required to provide the cluster-set-name option as a foolproof method.

2 - Fence writes from old primary

To avoid a split brain scenario, where more than one cluster is set as primary, it’s important to fence all write traffic from the failed primary cluster. For doing so there’s an action:

juju run -m az1 db1/leader fence-writes cluster-set-name=<my-cluster-set>

The action can be run against any of the cluster units.

Case the old primary is reestablished and/or have all transactions reconciled, one can resume write traffic to it, by using the unfence-writes action, e.g.:

juju run -m az1 db1/leader unfence-writes cluster-set-name=<my-cluster-set>

Switch app/clients between AZ

It is necessary to switch between AZ only if the previous AZ is not reachable:

juju remove-relation -m app mysql-router-k8s db1database
# wait for relation removed
juju relate -m app mysql-router-k8s db2database