LXD profile updates permitted by charms specification

rick_h · 26 July 2018 12:23

Summary

Some software, in order to operate properly, needs to be able to tweak the LXD profile used in deployment. There are an array of reasons from special kernel access for performance to allowing storage mounting that might pose a security risk in general but is expected behavior of a storage based Charm.

Using during development

In the edge snap only

Sample charm we use for testing:
https://github.com/juju/juju/tree/develop/testcharms/charm-repo/quantal/lxd-profile

Functionality available as of 29-Oct-2018:

You can upload a charm which includes lxd-profile.yaml to the charm store.
Juju deploy a charm utilizing the lxd-profile with the lxd provider and to lxd containers.
lxd-profile charms will be validated during deployment, --force is currently available.
lxd-profile details available in juju show-machine.
Upgrading a charm with an lxd-profile will update the profile applied to the unit’s machine. This includes removing a profile if the charm no longer has an lxd-profile and adding a profile if the charm now has one.
Subordinate charms will apply a lxd profile, the same as a non subordinate.

Examples of Profile Adjustments

OpenStack deployment: https://github.com/openstack-charmers/openstack-on-lxd/blob/master/lxd-profile.yaml
conjure-up spells https://github.com/conjure-up/spells/blob/master/canonical-kubernetes/steps/00_process-providertype/lxd-profile.yaml
- [Doc describing what this profile is doing]

Proposal

Charms

Charms gain the ability to specify a lxd-profile.yaml file in their root of the directory layout. This YAML file will be used to chain over on top of the existing Juju LXD profile as we’re only looking to provide additional functionality and not to make sure that things set by Juju are respected in the cases of security and more. LXD profiles are meant to be layered in such a way and so we should leverage the tooling there to help keep the layers on a container clean.

This means that if a charm is upgraded that the profile needs to be updated and we need to investigate if/how we can update the container with the correct profile. This is also true if another application is applied to the same container. We need to watch out for potential restarts of a container with a production workload that might be affected by a profile update.

The YAML file is expected to only be the keys needed and Juju will provide a unique name for the profile based on the charm so that it’s meaningful and able to be cleaned up upon removal of the application in question.

The name of the profile should also include the version of the charm so that future charm updates may permit the application of updated profiles and applications of mixed versions can function properly. In this way an Application might have in use a few different profiles at once.

If a charm is upgraded, once no unit references the previous profile it should be cleaned up and removed in the same vein.

Discerning a profile in use

The command juju show-machine shows the profile details as in current use when in the YAML/JSON output. In this way we can see what tweaks are being permitted and auditing of the container is easily accessible.

Hulk Smash

We have to be aware that it is possible to “hulk smash” charms onto the same container and they might have profiles including ones that conflict. For the scope of this current work we note that we should document that Hulk Smashing is not advised and since the profiles are layered on each other conflicts will be resolved by LXD during the merging of those.

Subordinates

Subordinates will also have the ability to provide a profile adjustment however their adjustments are applied to a related host. This might make checking that a profile is in use more interesting as it needs to be performed when the related application is made available in the model and not specifically when the subordinate is deployed since subordinates are unitless on the outset.
Removal of applications/upgrades

If an application is removed than the lxd-profile used should be removed from the host machine so that we do not keep an ever expanding list of profiles that may or may not be important to the current host machine.

Allowed keys for LXD profile

Juju will be opinionated on what keys you can set in your profile.yaml. If there is a key that is not allowed Juju will prevent the charm from deploying, the upgrade from occurring, etc.

Allowed keys

config (note ones below are excluded)
- -boot
- -limits
- -migration
devices of type
- unix-char
- unix-block
- gpu
- usb

Escaping the allow-list

If you wish to set a profile config value that is not allowed you’ll need deployed with the --force argument on the deploy command. When a deploy is --forced the allow-list will be ignored and it will be logged with a WARNING that we’re skipping allow-list checks.

<implementation note @joe>
This should be simple if we use a consistent prefix for the profile names.
The Profile type returned by LXD includes a UsedBy slice. At an appropriate point (post remove/upgrade of units?) just delete each name-spaced profile without any containers listed.
</implementation note>

Edge Cases

If a profile was to go missing from disk Juju should be prepared to recreate it so that the model continues to operate correctly.

Out of scope

There is a separate feature for Juju that needs to be done to enable passing through of raw devices to containers. This feature would enable things like passing raw network and GPU devices for performance purposes. While LXD profiles are used to help set this up this feature does not directly enable binding devices into containers through the Juju language at this time. This is considered a more basic building block for just enabling customizing the profile in generic methods and leaves the work around devices to future efforts.

james-page · 27 July 2018 10:31

With regards to the profile keys provided by the charm, I think Juju needs todo some whitelisting of keys for devices; For example without whitelisting it might be possible to pass through a device from the host with no checks/controls - consuming eth0 might break the host the container is running on.

If you look at the OpenStack on LXD profile:

name: juju-default
config:
  boot.autostart: "true"
  security.nesting: "true"
  security.privileged: "true"
  linux.kernel_modules: openvswitch,nbd,ip_tables,ip6_tables
devices:
 eth0:
   mtu: "9000"
   name: eth0
   nictype: bridged
   parent: lxdbr0
   type: nic
 eth1:
   mtu: "9000"
   name: eth1
   nictype: bridged
   parent: lxdbr0
   type: nic
 kvm:
   path: /dev/kvm
   type: unix-char
 mem:
   path: /dev/mem
   type: unix-char
 root:
   path: /
   type: disk
   pool: default
 tun:
   path: /dev/net/tun
   type: unix-char

type ‘nic’ or ‘disk’ should never be consumed from a charm profiled template - however the ‘unix-char’ devices are bind-mount type things and are probably OK.

config keys should probably just be directly consumed.

So the lxd-profile.yaml for the neutron-gateway charm would probably look like:

config:
  security.nesting: "true"
  security.privileged: "true"
  linux.kernel_modules: openvswitch,ip_tables,ip6_tables
devices:
 tun:
   path: /dev/net/tun
   type: unix-char

rick_h · 27 July 2018 10:47

@james-page thanks for the feedback.

I’m curious how much of the whitelist management to put into the charm tooling and things like proof vs Juju itself seeing that to change that list at all will require a new release/upgrade of Juju. Is the requirement around things like the nic type 100% or are there cases where we’d want exceptions to the rule?

james-page · 27 July 2018 12:17

I guess that depends on whether we want to enforce, or advise!

rick_h · 30 July 2018 15:48

In speaking with the OpenStack team today the idea that a whitelist is definitely preferred but that we can see a need for a “you’re on your own escape hatch” might be worthwhile.

Our goal from here is to gather the most complete case of what should be whitelisted.

rick_h · 2 August 2018 14:20

I’ve updated the spec with the notes on whitelisting anything in config and couple of devices that @james-page mentioned. I’m nervous that @kwmonroe isn’t going to be happy as they use the disk device type in the k8s setup and so they’ll still need something to back door.

I think overriding the whitelist with --force is a solid approach except that there’s no mechanism to use that via a bundle which is intentional. If you’re --forcing stuff then it’s not really something we want repeatable/production setups to be honest.

stgraber · 3 August 2018 03:27

From discussions in Montreal, I seem to remember us also wanting to filter the keys within the config section.

Looking at our namespaces, I’d suggest at least banning the following for now:

boot (irrelevant for Juju)
limits (should be done through Juju, not backdoored through the profile)
migration (irrelevant for Juju)

There are other keys that may or may not be acceptable and may therefore warrant a review warning of some kind:

Anything in the raw namespace
security.protection.* (as those may prevent container deletion for example)

For devices, you may consider also allowing the following:

usb
gpu

As they are effectively convenience around unix-char and so should be safe to allow.

jameinel · 14 August 2018 11:58

I think this sounds like a reasonable starting point. I think charms can certainly get themselves into bad configurations if they pass through actual hardware devices, because that is machine specific. GPU feels like it potentially falls into this issue, but I’m willing to let people explore the space.

rick_h · 30 August 2018 13:33

Just realized we need to think through this when Juju releases this feature and upgrades happen to models running charms that are prepared for this feature.

For example, as we work on this feature and the k8s charms add support for these profiles we’ll end up in a world where there are models running charms that support the feature but are not yet on a Juju that does. Once they upgrade Juju, we’ll see these charms with the supported file and need to find out how we proceed. Do we attempt to bring the current deployment into expected state or do we wait for charm activity, such as a charm upgrade/etc before we begin to interact with the profiles.

hmlanigan · 31 August 2018 15:21

For discussion, starting a list of keys which may be denied in the config:

user.*, currently you cannot make user changes via the juju model-config cloud-init-user-data either.
environment.http_proxy, will this interfere with juju proxy settings, which can already be modified via the juju model config. Looking for a list of what other items can be specified with environment.

thumper · 2 September 2018 22:23

I’d like to request that we use “allow” and “deny” as the basis for the names for the lists.

james-page · 13 September 2018 21:17

Any update on this feature? We’re planning a new feature for the OpenStack charms that would make use of this to allow deployment in LXD containers.

rick_h · 14 September 2018 00:04

This feature is under active development. The team’s been working on getting dependencies into shape and the first bits of allowing a charm to have the profile. In the coming weeks we expect to be reaching out to stakeholders to start to update charms and provide feedback.

fnordahl · 25 September 2018 09:42

We are developing a new charm that will make use of the pre-existing neutron-openvswitch subordinate for plumbing into a OpenStack overlay network on behalf of a payload installed in a container.

On-going development and testing shows this working by manually adding the following to either the LXD profile in use or the individual container config for the principal charm and then restarting the container:

config:
  linux.kernel_modules: openvswitch

Would love to help test any in-flight work you may have for this feature which would enable us to configure this automatically on charm installation.

Question: Will the configuration be applied on a per-profile basis or per-container basis?

jameinel · 25 September 2018 10:19

I believe the intent is to create a profile for the application that is being deployed into the container. So that if you have >1 container on the same machine hosting the same application, they will share the profile. However, it wouldn’t be set as say the “default” profile, applying to all containers on the same machine.

rick_h · 16 October 2018 18:49

Should security.priveleged be allowed by default? That seems like something that should be in the blacklist in security.

jameinel · 17 October 2018 05:04

This is very much a question of how much do you want things that “just works without fiddling” and how much do you want to protect users from accidents.

There is a strong argument that they are actively trying to install the software, so they would rather it works than that we need to protect them from software that might have issues.

It is certainly the sort of thing where it would be nice to help educate the user “this charm needs more privileges than an average charm, are you sure”. We have that with trust and giving the underlying provider credentials.

A lot of containers is more about “just carve out a space so it doesn’t interfere” rather than a security “don’t let it get access to the other stuff”. But there is a co-mingling of expectations.

I could be convinced either way, but I would bias towards allowing security.privileged be allowed. Because otherwise we’re pushing them to install the charm on the host machine, and there are still benefits to putting it into a container.

seffyroff · 24 October 2018 16:50

This is very relevant to my current project. I’m adding Ceph-osd charms to my LXD Juju deployments, and this seems like an elegant way to expose the host block devices to the OSD units for storage deployment. I’ll fork the ceph-osd charm repo and give this a try today.