Charmed OpenSearch How-to | Minor Version Upgrades

We strongly recommend to NOT perform any other extraordinary operations on Charmed OpenSearch cluster, while upgrading. Some examples would be:

  1. Adding or removing units
  2. Creating or destroying new relations
  3. Changes in workload configuration
  4. Upgrading other connected/related/integrated applications simultaneously
  5. Backup / restore of snapshots

The concurrency with other operations is not supported, and it can lead the cluster into inconsistent states.

Make sure to have Charmed OpenSearch backups of your data before running any type of upgrades.

Minor upgrade steps

  1. Collect all necessary pre-upgrade information. It will be necessary for the rollback (if requested). Do NOT skip this step.
  2. (optional) Scale up: The new sacrificial unit will be the first one to be updated, and it will simplify the rollback procedure in case of the upgrade failure.
  3. Prepare “Charmed OpenSearch” Juju application for the in-place upgrade. See the step description below for all technical details executed by charm here.
  4. Upgrade: Once started, only one unit of the app will be upgraded. In case of failure, roll back with juju refresh.
  5. Resume upgrade: If the new unit is OK after the refresh, the upgrade can be resumed. All units in an app will be executed sequentially from highest to lowest unit number.
  6. (optional) Consider rolling back in case of disaster. Please inform and include us in your case scenario troubleshooting to trace the source of the issue and prevent it in the future.
  7. (optional) Scale back: Remove no longer necessary units created in step 2 (if any).
  8. Post-upgrade check: Make sure all units are in the proper state and the cluster is healthy.

Step 1: Collect

The first step is to record the revision of the running application, as a safety measure for a rollback action. To accomplish this, simply run the juju status command and look for the deployed Charmed OpenSearch revision in the command output, e.g.:

Model  Controller           Cloud/Region         Version  SLA          Timestamp
test   localhost-localhost  localhost/localhost  3.3.4    unsupported  13:02:15Z

App                       Version  Status  Scale  Charm                     Channel        Rev  Exposed  Message
opensearch                         active      3  opensearch                                86  no       
self-signed-certificates           active      1  self-signed-certificates  latest/stable   72  no       

Unit                         Workload  Agent  Machine  Public address  Ports     Message
opensearch/0*                active    idle   1        10.229.18.7     9200/tcp  
opensearch/1                 active    idle   2        10.229.18.182   9200/tcp  
opensearch/2                 active    idle   3        10.229.18.34    9200/tcp  
self-signed-certificates/0*  active    idle   0        10.229.18.118             

Machine  State    Address        Inst id        Base          AZ  Message
0        started  10.229.18.118  juju-d69356-0  ubuntu@22.04      Running
1        started  10.229.18.7    juju-d69356-1  ubuntu@22.04      Running
2        started  10.229.18.182  juju-d69356-2  ubuntu@22.04      Running
3        started  10.229.18.34   juju-d69356-3  ubuntu@22.04      Running

If the deployment is of a local charm, make sure you save a copy of the current .charm file BEFORE going further. You might need it for rollback.

For this example, the current revision is 86 for OpenSearch.

Store the revision or the .charm file safely to use in case of rollback.

Step 2: Scale-up (optional)

Optionally, it is recommended to scale the application up by one unit before starting the upgrade process.

The new unit will be the first one to be updated, and it will assert that the upgrade is possible. In case of failure, having the extra unit will ease the rollback procedure, without disrupting service. More in Minor rollback how-to.

juju add-unit opensearch

Wait for the new unit up and ready.

Step 3: Prepare

  1. IMPORTANT: Create a backup of your cluster

Ensure you create a backup of your cluster, please refer to the backup section.

  1. pre-upgrade-check

After the application has settled, it’s necessary to run the pre-upgrade-check action against the leader unit:

juju run opensearch/leader pre-upgrade-check

The action will ensure and check the health of OpenSearch as well as if the charm is well prepared to start an upgrade procedure.

Step 4: Upgrade

Use the juju refresh command to trigger the charm upgrade process.

juju refresh opensearch --channel 2/edge

The opensearch upgrade will execute only on the highest ordinal unit, for the running example opensearch, the juju status will look as follows:

Model  Controller           Cloud/Region         Version  SLA          Timestamp
test   localhost-localhost  localhost/localhost  3.3.4    unsupported  13:02:15Z

App                       Version  Status  Scale  Charm                     Channel        Rev  Exposed  Message
opensearch                         active      3  opensearch                                87  no       Upgrading. Verify highest unit is healthy & run `resume-upgrade` action. To rollback, `juju refresh` to last revision
self-signed-certificates           active      1  self-signed-certificates  latest/stable   72  no 

Unit                         Workload  Agent  Machine  Public address  Ports     Message
opensearch/0                 active    idle   1        10.229.18.7     9200/tcp  OpenSearch 2.12.0 running; Snap rev 40 (outdated); Charmed operator 1+631f817-dirty+71f8619-dirty
opensearch/1                 active    idle   2        10.229.18.182   9200/tcp  OpenSearch 2.12.0 running; Snap rev 40 (outdated); Charmed operator 1+631f817-dirty+71f8619-dirty
opensearch/2*                active    idle   3        10.229.18.34    9200/tcp  OpenSearch 2.12.0 running; Snap rev 44; Charmed operator 1+631f817-dirty+71f8619-dirty
self-signed-certificates/0*  active    idle   0        10.229.18.118
    

The unit should recover shortly after, but the time can vary depending on the amount of data written to the cluster while the unit was not part of the cluster. Please be patient on the huge installations.

Step 5: Resume

After the unit is upgraded, the charm will set the unit upgrade state as completed. If deemed necessary, the user can further assert the success of the upgrade. If the unit is healthy within the cluster, the next step is to resume the upgrade process by running:

juju run-action opensearch/leader resume-upgrade

The resume-upgrade will roll out the OpenSearch upgrade for the following unit, always from highest to lowest. For each successfully upgraded unit beyond the first, the process will roll out the next one automatically.

Step 6: Rollback (optional)

If the upgrade was incompatible, it’s important to roll back the charm to a previous revision so that an update can be later attempted after a further inspection of the failure. More in Minor rollback how-to.

Step 7: Scale-back

Case the application scale was changed for the upgrade procedure, it is now safe to scale it back to the desired unit count:

juju remove-unit opensearch/<highest unit number>

Step 8: Check

First, check the units have settled as “active/idle” state on juju status, with the newer revision number:

Model  Controller           Cloud/Region         Version  SLA          Timestamp
test   localhost-localhost  localhost/localhost  3.3.4    unsupported  13:02:15Z

App                       Version  Status  Scale  Charm                     Channel        Rev  Exposed  Message
opensearch                         active      3  opensearch                                87  no       
self-signed-certificates           active      1  self-signed-certificates  latest/stable   72  no       

Unit                         Workload  Agent  Machine  Public address  Ports     Message
opensearch/0*                active    idle   1        10.229.18.7     9200/tcp  
opensearch/1                 active    idle   2        10.229.18.182   9200/tcp  
opensearch/2                 active    idle   3        10.229.18.34    9200/tcp  
self-signed-certificates/0*  active    idle   0        10.229.18.118             

Machine  State    Address        Inst id        Base          AZ  Message
0        started  10.229.18.118  juju-d69356-0  ubuntu@22.04      Running
1        started  10.229.18.7    juju-d69356-1  ubuntu@22.04      Running
2        started  10.229.18.182  juju-d69356-2  ubuntu@22.04      Running
3        started  10.229.18.34   juju-d69356-3  ubuntu@22.04      Running

Check the cluster is healthy. OpenSearch’s upstream documentation suggests the following check:

GET "/_cluster/health?pretty"

The response should look similar to the following example:

{
  "cluster_name" : "test-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "discovered_master" : true,
  "active_primary_shards" : 1,

...

  "active_shards_percent_as_number" : 100.0
}
1 Like