We strongly recommend to NOT perform any other extraordinary operations on Charmed OpenSearch cluster, while upgrading. Some examples would be:
- Adding or removing units
- Creating or destroying new relations
- Changes in workload configuration
- Upgrading other connected/related/integrated applications simultaneously
- Backup / restore of snapshots
The concurrency with other operations is not supported, and it can lead the cluster into inconsistent states.
Make sure to have Charmed OpenSearch backups of your data before running any type of upgrades.
The following steps must be applied for large deployments, where multiple juju applications with OpenSearch charms to form a bigger cluster. For a single juju application, please refer to this how-to instead.
Minor upgrade steps
Large deployments differ from standard Minor Upgrades in which each juju application must be upgraded separately. Also, rollback changes as some clusters will have to perform a downgrade.
- Collect. All necessary pre-upgrade information. It will be necessary for the rollback (if requested). Do NOT skip this step.
- Prepare. Run the pre-upgrade-check to make sure the upgrade can start
- Decide Upgrade Order. The upgrade must be first executed on all non-cluster-manager (active or failover clusters) before moving forward to the cluster-manager applications.
- Minor Upgrade. For each juju application, execute the steps described in the Minor Upgrades document
- (OPTIONAL in case any failures during the process) Rollback. Start by rolling back the application currently in the middle of the upgrade, then downgrade all the upgraded applications in an inverted order as described on items (2, 3). Once reached the main cluster manager, it is no longer possible to execute the rollback
- Post-upgrade check. Make sure all units are in the proper state and the cluster is healthy.
Once reached the main cluster manager, it is no longer possible to execute the rollback
Step 1: Collect
The first step is to record the revision of the running application, as a safety measure for a rollback action. To accomplish this, simply run the juju status command and look for the deployed Charmed OpenSearch revision in the command output, e.g.:
Model Controller Cloud/Region Version SLA Timestamp
test-large-deployment-upgrades-jbmj localhost-localhost localhost/localhost 3.4.2 unsupported 19:18:24+02:00
App Version Status Scale Charm Channel Rev Exposed Message
failover active 2 opensearch latest/edge 86 no
main active 1 opensearch latest/edge 86 no
opensearch active 3 opensearch latest/edge 86 no
self-signed-certificates active 1 self-signed-certificates latest/stable 72 no
Unit Workload Agent Machine Public address Ports Message
failover/0 active idle 0 10.173.208.184 9200/tcp
failover/1* active idle 1 10.173.208.236 9200/tcp
main/0* active idle 2 10.173.208.204 9200/tcp
opensearch/0 active idle 4 10.173.208.14 9200/tcp
opensearch/1 active idle 5 10.173.208.128 9200/tcp
opensearch/2* active idle 6 10.173.208.36 9200/tcp
self-signed-certificates/0* active idle 3 10.173.208.103
Machine State Address Inst id Base AZ Message
0 started 10.173.208.184 juju-b0cc19-0 ubuntu@22.04 Running
1 started 10.173.208.236 juju-b0cc19-1 ubuntu@22.04 Running
2 started 10.173.208.204 juju-b0cc19-2 ubuntu@22.04 Running
3 started 10.173.208.103 juju-b0cc19-3 ubuntu@22.04 Running
4 started 10.173.208.14 juju-b0cc19-4 ubuntu@22.04 Running
5 started 10.173.208.128 juju-b0cc19-5 ubuntu@22.04 Running
6 started 10.173.208.36 juju-b0cc19-6 ubuntu@22.04 Running
If the deployment is of a local charm, make sure you save a copy of the current .charm file BEFORE going further. You might need it for rollback.
For this example, the current revision is 86 for OpenSearch.
Store the revision or the .charm file safely to use in case of rollback.
Step 2: Prepare
On the cluster running the “cluster manager” leader, it’s necessary to run the pre-upgrade-check action against the leader unit:
juju run main/leader pre-upgrade-check
If this action is successful, then proceed to the next step.
Step 3: Decide Upgrade Order
The upgrade must be first executed on all non-cluster-manager (active or failover clusters) before moving forward to the cluster-manager applications. The recommended order to upgrade is the following:
- voting_only
- search
- ml
- data: (i) data.frozen; (ii) data.cold; (iii) data.warm; (iv) data.hot; etc
- ingest
- coordinator_only
- cluster_manager
In this example:
Model Controller Cloud/Region Version SLA Timestamp
test-large-deployment-upgrades-jbmj localhost-localhost localhost/localhost 3.4.2 unsupported 19:18:24+02:00
App Version Status Scale Charm Channel Rev Exposed Message
failover active 2 opensearch latest/edge 87 no
main active 1 opensearch latest/edge 87 no
opensearch active 3 opensearch latest/edge 87 no
self-signed-certificates active 1 self-signed-certificates latest/stable 72 no
Unit Workload Agent Machine Public address Ports Message
failover/0 active idle 0 10.173.208.184 9200/tcp
failover/1* active idle 1 10.173.208.236 9200/tcp
main/0* active idle 2 10.173.208.204 9200/tcp
opensearch/0 active idle 4 10.173.208.14 9200/tcp
opensearch/1 active idle 5 10.173.208.128 9200/tcp
opensearch/2* active idle 6 10.173.208.36 9200/tcp
self-signed-certificates/0* active idle 3 10.173.208.103
Machine State Address Inst id Base AZ Message
0 started 10.173.208.184 juju-b0cc19-0 ubuntu@22.04 Running
1 started 10.173.208.236 juju-b0cc19-1 ubuntu@22.04 Running
2 started 10.173.208.204 juju-b0cc19-2 ubuntu@22.04 Running
3 started 10.173.208.103 juju-b0cc19-3 ubuntu@22.04 Running
4 started 10.173.208.14 juju-b0cc19-4 ubuntu@22.04 Running
5 started 10.173.208.128 juju-b0cc19-5 ubuntu@22.04 Running
6 started 10.173.208.36 juju-b0cc19-6 ubuntu@22.04 Running
The upgrade order will be:
- “opensearch” application: non-cluster manager and data-only app
- “failover” application: cluster manager app but does not contain the elected cluster manager unit
- “main” application: cluster manager app that contains the cluster manager.
Step 4: Minor Upgrade for Each Juju App
Once reached the main cluster manager, it is no longer possible to execute rollback!
For each Juju Application, as the example in Step 2, execute the step-by-step from 2 and steps 4 to 6 as described in Minor Upgrades documentation.
At the end of the upgrade, ensure all units of the target juju application are stated in “active/idle”.
Step 5: (OPTIONAL in case of failure) Rollback
Once reached the main cluster manager, it is no longer possible to execute rollback!
There are two types of rollbacks:
- The rollback caused by a mistyping or an error: in this case the rollback should happen on a single juju application
- The rollback of the entire cluster during the upgrade process
For case (1.), the procedure described in the Minor Rollback guide is enough to rollback one of the juju applications back to its initial state and restart the upgrade process in that specific juju app.
The case (2.) demands the rollback to happen in two stages: (i) if we were in the process of upgrading a given juju application, then we need to execute the same rollback procedure as described in the Minor Rollback guide; and (ii) once the juju application has done its rollback, we must downgrade any of the juju applications that have been successfully upgraded before in this process.
In our example above, let’s imagine the upgrade process has successfully moved “opensearch” to a newer version and was doing the upgrade on “failover” app. In this case, the rollback described in case (b.) must follow this procedure:
- Rollback as described in Minor Rollback guide for “failover” app
- Wait for the failover to return to its original state and have each service marked as “active/idle”
- Execute the downgrade of “opensearch” cluster, by: (i) refreshing this juju application back to its original revision; and (ii) overriding the upgrade process to implement the downgrade. The steps below give an example on how to execute these tasks:
juju refresh opensearch --revision=<original-revision>
# Wait until the refresh has settled and execute
juju run opensearch/leader resume-upgrade
Step 6: Check
Run any necessary checks to validate the upgrade was successful. Check the Minor Upgrades documentation for recommendations