Charmed OpenSearch How-to | Large Deployments | Minor Version Upgrades & Rollback

We strongly recommend to NOT perform any other extraordinary operations on Charmed OpenSearch cluster, while upgrading. Some examples would be:

  1. Adding or removing units
  2. Creating or destroying new relations
  3. Changes in workload configuration
  4. Upgrading other connected/related/integrated applications simultaneously
  5. Backup / restore of snapshots

The concurrency with other operations is not supported, and it can lead the cluster into inconsistent states.

Make sure to have Charmed OpenSearch backups of your data before running any type of upgrades.

The following steps must be applied for large deployments, where multiple juju applications with OpenSearch charms to form a bigger cluster. For a single juju application, please refer to this how-to instead.

Minor upgrade steps

Large deployments differ from standard Minor Upgrades in which each juju application must be upgraded separately. Also, rollback changes as some clusters will have to perform a downgrade.

  1. Collect. All necessary pre-upgrade information. It will be necessary for the rollback (if requested). Do NOT skip this step.
  2. Prepare. Run the pre-upgrade-check to make sure the upgrade can start
  3. Decide Upgrade Order. The upgrade must be first executed on all non-cluster-manager (active or failover clusters) before moving forward to the cluster-manager applications.
  4. Minor Upgrade. For each juju application, execute the steps described in the Minor Upgrades document
  5. (OPTIONAL in case any failures during the process) Rollback. Start by rolling back the application currently in the middle of the upgrade, then downgrade all the upgraded applications in an inverted order as described on items (2, 3). Once reached the main cluster manager, it is no longer possible to execute the rollback
  6. Post-upgrade check. Make sure all units are in the proper state and the cluster is healthy.

Once reached the main cluster manager, it is no longer possible to execute the rollback

Step 1: Collect

The first step is to record the revision of the running application, as a safety measure for a rollback action. To accomplish this, simply run the juju status command and look for the deployed Charmed OpenSearch revision in the command output, e.g.:

Model                                Controller           Cloud/Region         Version  SLA          Timestamp
test-large-deployment-upgrades-jbmj  localhost-localhost  localhost/localhost  3.4.2    unsupported  19:18:24+02:00

App                       Version  Status  Scale  Charm                               Channel        Rev  Exposed  Message
failover                           active      2  opensearch                          latest/edge     86  no       
main                               active      1  opensearch                          latest/edge     86  no       
opensearch                         active      3  opensearch                          latest/edge     86  no       
self-signed-certificates           active      1  self-signed-certificates            latest/stable   72  no       

Unit                         Workload  Agent  Machine  Public address  Ports     Message
failover/0                   active    idle   0        10.173.208.184  9200/tcp  
failover/1*                  active    idle   1        10.173.208.236  9200/tcp  
main/0*                      active    idle   2        10.173.208.204  9200/tcp  
opensearch/0                 active    idle   4        10.173.208.14   9200/tcp  
opensearch/1                 active    idle   5        10.173.208.128  9200/tcp  
opensearch/2*                active    idle   6        10.173.208.36   9200/tcp  
self-signed-certificates/0*  active    idle   3        10.173.208.103            

Machine  State    Address         Inst id        Base          AZ  Message
0        started  10.173.208.184  juju-b0cc19-0  ubuntu@22.04      Running
1        started  10.173.208.236  juju-b0cc19-1  ubuntu@22.04      Running
2        started  10.173.208.204  juju-b0cc19-2  ubuntu@22.04      Running
3        started  10.173.208.103  juju-b0cc19-3  ubuntu@22.04      Running
4        started  10.173.208.14   juju-b0cc19-4  ubuntu@22.04      Running
5        started  10.173.208.128  juju-b0cc19-5  ubuntu@22.04      Running
6        started  10.173.208.36   juju-b0cc19-6  ubuntu@22.04      Running

If the deployment is of a local charm, make sure you save a copy of the current .charm file BEFORE going further. You might need it for rollback.

For this example, the current revision is 86 for OpenSearch.

Store the revision or the .charm file safely to use in case of rollback.

Step 2: Prepare

On the cluster running the “cluster manager” leader, it’s necessary to run the pre-upgrade-check action against the leader unit:

juju run main/leader pre-upgrade-check

If this action is successful, then proceed to the next step.

Step 3: Decide Upgrade Order

The upgrade must be first executed on all non-cluster-manager (active or failover clusters) before moving forward to the cluster-manager applications. The recommended order to upgrade is the following:

  1. voting_only
  2. search
  3. ml
  4. data: (i) data.frozen; (ii) data.cold; (iii) data.warm; (iv) data.hot; etc
  5. ingest
  6. coordinator_only
  7. cluster_manager

In this example:

Model                                Controller           Cloud/Region         Version  SLA          Timestamp
test-large-deployment-upgrades-jbmj  localhost-localhost  localhost/localhost  3.4.2    unsupported  19:18:24+02:00

App                       Version  Status  Scale  Charm                               Channel        Rev  Exposed  Message
failover                           active      2  opensearch                          latest/edge     87  no       
main                               active      1  opensearch                          latest/edge     87  no       
opensearch                         active      3  opensearch                          latest/edge     87  no       
self-signed-certificates           active      1  self-signed-certificates            latest/stable   72  no       

Unit                         Workload  Agent  Machine  Public address  Ports     Message
failover/0                   active    idle   0        10.173.208.184  9200/tcp  
failover/1*                  active    idle   1        10.173.208.236  9200/tcp  
main/0*                      active    idle   2        10.173.208.204  9200/tcp  
opensearch/0                 active    idle   4        10.173.208.14   9200/tcp  
opensearch/1                 active    idle   5        10.173.208.128  9200/tcp  
opensearch/2*                active    idle   6        10.173.208.36   9200/tcp  
self-signed-certificates/0*  active    idle   3        10.173.208.103            

Machine  State    Address         Inst id        Base          AZ  Message
0        started  10.173.208.184  juju-b0cc19-0  ubuntu@22.04      Running
1        started  10.173.208.236  juju-b0cc19-1  ubuntu@22.04      Running
2        started  10.173.208.204  juju-b0cc19-2  ubuntu@22.04      Running
3        started  10.173.208.103  juju-b0cc19-3  ubuntu@22.04      Running
4        started  10.173.208.14   juju-b0cc19-4  ubuntu@22.04      Running
5        started  10.173.208.128  juju-b0cc19-5  ubuntu@22.04      Running
6        started  10.173.208.36   juju-b0cc19-6  ubuntu@22.04      Running

The upgrade order will be:

  • “opensearch” application: non-cluster manager and data-only app
  • “failover” application: cluster manager app but does not contain the elected cluster manager unit
  • “main” application: cluster manager app that contains the cluster manager.

Step 4: Minor Upgrade for Each Juju App

Once reached the main cluster manager, it is no longer possible to execute rollback!

For each Juju Application, as the example in Step 2, execute the step-by-step from 2 and steps 4 to 6 as described in Minor Upgrades documentation.

At the end of the upgrade, ensure all units of the target juju application are stated in “active/idle”.

Step 5: (OPTIONAL in case of failure) Rollback

Once reached the main cluster manager, it is no longer possible to execute rollback!

There are two types of rollbacks:

  1. The rollback caused by a mistyping or an error: in this case the rollback should happen on a single juju application
  2. The rollback of the entire cluster during the upgrade process

For case (1.), the procedure described in the Minor Rollback guide is enough to rollback one of the juju applications back to its initial state and restart the upgrade process in that specific juju app.

The case (2.) demands the rollback to happen in two stages: (i) if we were in the process of upgrading a given juju application, then we need to execute the same rollback procedure as described in the Minor Rollback guide; and (ii) once the juju application has done its rollback, we must downgrade any of the juju applications that have been successfully upgraded before in this process.

In our example above, let’s imagine the upgrade process has successfully moved “opensearch” to a newer version and was doing the upgrade on “failover” app. In this case, the rollback described in case (b.) must follow this procedure:

  1. Rollback as described in Minor Rollback guide for “failover” app
  2. Wait for the failover to return to its original state and have each service marked as “active/idle”
  3. Execute the downgrade of “opensearch” cluster, by: (i) refreshing this juju application back to its original revision; and (ii) overriding the upgrade process to implement the downgrade. The steps below give an example on how to execute these tasks:
juju refresh opensearch --revision=<original-revision>

# Wait until the refresh has settled and execute
juju run opensearch/leader resume-upgrade

Step 6: Check

Run any necessary checks to validate the upgrade was successful. Check the Minor Upgrades documentation for recommendations