Provider implementation notes

Contents

Azure

Azure currently has two “stacks”: Service Management (“classic”), and Resource Manager (“ARM”). We talk here mostly about the ARM stack, which is replacing the classic stack. Juju has support for the legacy/classic stack, but it is in maintenance-only mode, and any new environments will be ARM-based.

Resource Groups

Azure has a concept of “resource groups”, which are containers for IaaS resources: machines, networks, disks, etc. Each Juju environment – including hosted – is represented by a resource group. Resource groups must be named uniquely within the subscription: we use the naming scheme “juju-<env-name>-environment-<uuid>”. All of the resources for the environment are fully contained within the resource group.

To destroy an environment we must delete the subnet associated with the environment, and delete the environment’s resource group.

Networking

Each model has its own virtual network called “juju-internal-network”, and a single 10.0.0.0/16 subnet within that network called “juju-internal-subnet”. Note that these networks are not routable between models; Juju agents will communicate with the servers using their public addresses.

Each environment is also given its own network security group (“juju-internal-nsg”), which manages firewalls for the environment. There are 100 network security groups in a subscription by default, so there is a default limit of 100 environments. This limit can be raised by contacting Microsoft Azure support.

Each machine is created with a single NIC, attached to the internal subnet. Each NIC also has a public IP assigned. We will probably want to only assign public IPs to controllers by default, and defer assignment of public IPs to machines until they are exposed (and then delete when all ports are unexposed), because public IP addresses are limited (60 public IPs per subscription by default). This should at least be made configurable.

Storage

Each environment resource group contains a storage account in which virtual machine images are stored. Storage accounts default to using the “Standard LRS” (Locally Redundant Storage) account type, but this is configurable.

The Azure storage provider has support for volumes. In the future we may extend the storage provider to support Azure File Storage, which would enable shared file systems.

The Azure volume source is dynamic, environment-scoped, and manages persistent volumes. Each Juju volume represents a VHD in the “datavhds” blob container of the environment’s storage account. A volume attachment represents a “data disk”.

Availability Sets

Each service deployed to an environment will create an “availability set” for that service. When a machine is created to host a unit of the service, the machine will join that availability set. Azure ensures that machines in an availability set are (a) not automatically rebooted at the same time (i.e. for infrastructure upgrades); and (b) allocated to redundant hardware, to avoid faults bringing down all service units simultaneously.

Availability sets are similar to “availability zones” in AWS and elsewhere, but dissimilar enough that they do not fit into Juju’s abstraction of zones. In particular, charms cannot query what “zone” they are in on Azure.

Images

Azure Resource Manager uses a different system for selecting OS images than the classic stack, and the simplestreams data Canonical publishes is not relevant to ARM. However, Azure provides its own registry for images, which Juju will use.

Images are published with four identifying attributes:

  • Publisher (e.g. “Canonical”)
  • Offer (e.g. “UbuntuServer”)
  • SKU (e.g. “14.04.3-LTS”)
  • Version (e.g. “14.04.201510200”, or “latest”)

Because SKUs do not map directly to series, we must list the SKUs for a publisher+offer, and then choose the best one. We only do this for Ubuntu for now. We have hard-coded the publisher/offer names for Ubuntu Server, Microsoft Windows Server 2012, and OpenLogic CentOS 7.1. For Ubuntu, we use cloud-init; for Windows and CentOS, we use the CustomScript virtual machine extension to execute the configuration rendered as a script.

We currently query the image registry each time we create a machine, but this will be changed with the introduction of structured image metadata in state. We will change to having an Azure-specific data source that lists images in the registry; this will be periodically polled for updates, and fed into state. This data will then be presented to the machine provisioner, so it does not have to make the additional network query each time.

Instances

Instances naturally represent Virtual Machines in Azure, but there are additional resources for each instance. Each VM is given a single NIC with a static private IP and a dynamic public IP; later this will change with the introduction of extended networking support. Each VM may have zero or more network security rules associated with it.

Due to several restrictions, there are some peculiarities relating to the listing and deletion of instances that requires some explanation. To prevent leaking resources, the provider must continue to report instances until all of the associated resources are deleted: VM, NIC, public IP, etc. The most obvious thing to do would be to delete the VM last, but this, unfortunately, is not possible.

A VM must have at least one NIC attached; it is not possible to delete a NIC while it is attached to a VM. Thus the NICs must be deleted after the VM; at least one, and so it may as well be the case for all of them. When we delete an instance, we first delete the VM and then the remaining resources. We leave the NICs last, and tag NICs with the name (instance ID) of the machines they were created for, so that their presence indicates the presence of an instance in spite of there being no corresponding Virtual Machine.


Migrated from the GitHub wiki