New diff-bundle command

diff-bundle is a model command to show differences between a bundle and a model. This is really useful when you’re trying to see what changes might have been made in production over time that are different from the original bundle you started out with. You might also use this to snapshot updates to the bundle over time, especially if it’s in active development.

    juju diff-bundle <bundle>
        --overlay <overlay-file> 
        --map-machines <mapping or existing>
        --annotations

Some usage examples:

    juju diff-bundle localbundle.yaml
    juju diff-bundle canonical-kubernetes
    juju diff-bundle -m othermodel hadoop-spark
    juju diff-bundle mongodb-cluster --channel beta
    juju diff-bundle canonical-kubernetes --overlay local-config.yaml --overlay extra.yaml
    juju diff-bundle localbundle.yaml --map-machines 3=4

This will compare the bundle (combined with any overlays) to the model, using the machine mapping provided, and display differences, considering (as a first pass):

  • applications (missing or unexpected)
  • unit counts (will need special handling for subordinates, where the bundle unit count will be zero)
  • charm
  • config
  • series
  • constraints
  • annotations (if the --annotations flag is specified - these are likely to be noisy)
  • exposedness

In subsequent passes we’ll also diff placement, endpoint bindings and devices.

Here’s an example of diff-bundle output:

applications:
  etcd:
    num_units:
      bundle: 3
      model:  5
  kubernetes-master:
    charm:
      bundle: cs:~containers/kubernetes-master-150
      model:  cs:~containers/kubernetes-master-144
    constraints:
      bundle: cores=2 mem=4G root-disk=16G
      model:  cores=2 mem=4G root-disk=8G
    options:
      channel:
        bundle: 1.11/stable
        model:  1.11/beta
      other_option:
        bundle: value
        model:
  kubeapi-load-balancer:
    missing: model
machines:
  "0":
    series:
      bundle: xenial
      model:  bionic
  "1":
    missing: bundle
relations:
  bundle-additions:
    - - kubernetes-master:kube-api-endpoint 
      - kubeapi-load-balancer:apiserver
    - - etcd:certificates
      - easyrsa:client
  model-additions:
    - - flannel:etcd 
      - etcd:db

Are you going to give examples for the output?

We should note that one of the big use cases is that they want to do 2 deployments at different sites, and then take a snapshot (juju export-bundle) and then compare what exists at both sites. So I think they do want to know “what is there that isn’t in the bundle” as well as “what is in the bundle that isn’t in the model”.
I believe it is also desirable to know what bits of configuration are different, etc.

It would be good to know if the output is intended to be:

  1. Some sort of diff/patch format
  2. An overlay (possibly 2 overlays for each direction), which you could potentially apply to get the same content.

I think (1) could be an option (create a full expanded bundle and then create a textual diff against it.)

Thanks John -

Yes, it’s definitely going to be a two-way comparison - what’s unexpectedly present and what’s missing from the model. And yes, also reporting differences in configuration values.

I’m not sure that generating a bundle overlay for each direction will be useful/readable, because it might be hard to understand what the differences are - everything looks like adding an application. You’d need to notice that not all of the config values are listed for the application to distinguish between “the model’s missing this application” and “the application’s present but configured differently”. To take this further, say there were differences in number of units, charm and configuration - it could be really hard to see that it wasn’t a missing application in that case. I’m going to mock up some examples to work out whether that concern holds water.

(Tim is also saying that bundle overlays can’t really represent application removal - you can have application: <nothing>, but saving that and applying it won’t remove the application from the model unless we make other changes. So maybe representing diffs as overlays isn’t the right thing.)

I hadn’t heard about the use case for comparing two deployments. Would doing export-bundle on one model and then running diff-bundle with that against the other model do what we want? We talked to Junien from IS a bitabout this but he wasn’t particularly thinking about that use case - who do you think would be good to ask for some more guidance?

Oh, I forgot about the diff/patch suggestion - we’ve been considering that too. It’s got the obvious benefit that people know how to read them, and there’s the possibility of automatically applying a patch to a bundle file, if that’s something that users want.

One concern there is that a textual diff isn’t going to give enough context - you’ll see differences between config values but might not be able to see the application that the config belongs to. Unless we do something like a git diff where the function name is included in the location, maybe that would be enough?

Another issue with a patch format is that the “source” bundle will potentially composed of a base bundle file and one or more overlays, and the “target” bundle extracted from the model could well have sections in a different order from the source bundle . In both cases, the line numbers in a patch are not going to be meaningful at all, and different ordering between source and target will mean a patch won’t represent how the bundle and model differ on a semantic level.

There actually aren’t that many types of differences in the list in the main post, so I’m inclined to represent each kind in it’s own structure, with the differences grouped by application. I’ll do a mock-up of this along with the overlay version so there’s some basis for a decision.

That will probably make discussions with users more concrete too.

Mocking it up seems like a reasonable plan. I agree that "application: " doesn’t work to show removals. At best the Overlay semantic would mean that you would see the application created in one direction. (A => B adds application foo, means that B doesn’t have foo.)

Some real problems with diffs are:

  1. Ordering. Though we could just say “keys are in sorted order” which should give us a stable order. It means we would be rewriting the order vs the bundle that they handed to us, but it still would probably be better to use a stable ordering.
  2. Context as you mention. Though again, unified diff can just be put with all context, you don’t have to trim it down to just the 2-closest lines.
  3. Diffs can’t really be used, as-is. FWIW, though, we don’t have any way to give people an output that they could ‘just apply/deploy’ that would allow them to remove applications. (deploy never removes anything.)

I believe the “compare A to B” was more from Bootstack than IS. But they definitely struggle with managing many similar-but-not-quite-the-exactly-the-same deployments.

I don’t know if we’d want 2 possible outputs. One being “tell a human how these things are different” and the other being “give a structured output that I can use to apply these changes somewhere else”. Its hard to have one format do both things.

Ok, here’s what I have as example output. What do people think?

juju diff-bundle mybundle.yaml --overlay ov.yaml

// Missing application - no kubeapi-load-balancer
applications:
  kubeapi-load-balancer: missing from model

// extra application
applications:
  wordpress: unexpected in model


// mismatched unit counts
applications:
  etcd:
    bundle: {num_units: 3}
    model:  {num_units: 5}


// charm
applications:
  kubernetes-master:
    bundle: {charm: cs:~containers/kubernetes-master-150}
    model:  {charm: cs:~containers/kubernetes-master-144}

// config
applications:
  kubernetes-worker:
    bundle:
      options:
        channel: 1.11/stable
        other_option: value
    model:
      options:
        channel: 1.13/beta
        other_option:

// relation differences

relations:
  bundle-extra:
    - [kubernetes-master:kube-api-endpoint kubeapi-load-balancer:apiserver]
    - [etcd:certificates easyrsa:client]
  model-extra:
    - [flannel:etcd etcd:db]

// multiple differences

applications:
  etcd:
    bundle: {num_units: 3}
    model:  {num_units: 5}
  kubernetes-master:
    bundle:
      charm: cs:~containers/kubernetes-master-150
      constraints: cores=2 mem=4G root-disk=16G
      options:
        channel: 1.11/stable
        other_option: value
    model:
      charm: cs:~containers/kubernetes-master-144
      constraints: cores=2 mem=4G root-disk=8G
      options:
        channel: 1.11/beta
        other_option:
  kubeapi-load-balancer: missing from model
relations:
  bundle-extra:
    - [kubernetes-master:kube-api-endpoint kubeapi-load-balancer:apiserver]
    - [etcd:certificates easyrsa:client]
  model-extra:
    - [flannel:etcd etcd:db]

What do you think of trying to group the changes together side by side more like:

applications:
  etcd:
    num_units:
      bundle: 3
      model: 5


// charm
applications:
  kubernetes-master:
     charm:
      bundle: {charm: cs:~containers/kubernetes-master-150}
      model:  {charm: cs:~containers/kubernetes-master-144}

Yeah, I think I like that more. It’s more verbose for the combined diff, but it’s easier to compare the differences because you don’t need to bounce your eye back and forth between far-apart lines.

// multiple differences with bundle/model drilled down more.
applications:
  etcd:
    num_units:
      bundle: 3
      model:  5
  kubernetes-master:
    charm:
      bundle: cs:~containers/kubernetes-master-150
      model:  cs:~containers/kubernetes-master-144
    constraints:
      bundle: cores=2 mem=4G root-disk=16G
      model:  cores=2 mem=4G root-disk=8G
    options:
      channel:
        bundle: 1.11/stable
        model:  1.11/beta
      other_option:
        bundle: value
        model:
  kubeapi-load-balancer:
    model: missing
relations:
  bundle-extra:
    - [kubernetes-master:kube-api-endpoint kubeapi-load-balancer:apiserver]
    - [etcd:certificates easyrsa:client]
  model-extra:
    - [flannel:etcd etcd:db]

The combined output is a lot more legible. I’d want to make sure its a real comparison (the big thing is config can have lots of individual items that are different.)

Hi,

is there any reason to use the “-extra” suffix?

Best,

Hey @freyes - my reasoning was that if it was just

bundle:
- [kubernetes-master:kube-api-endpoint kubeapi-load-balancer:apiserver]
- [etcd:certificates easyrsa:client]
model:
- [flannel:etcd etcd:db]

…it wouldn’t be obvious whether those relations were missing from the bundle/model or unexpectedly present. What do you think?

(That said, bundle-extra and model-extra are definitely clunky - I’d welcome other options.)

In the diff command you have that the first argument is the base you are comparing with, it seems diff-bundle doesn’t use that approach, if it does, then everything should be informed in terms of “this missing from the bundle” and “this is extra in the bundle”, the model would be the source of truth, I believe this would make things easier to communicate to the user

$ juju diff-bundle foo.yaml --overlay bar.yaml
....
relations:
  deleted:  // these are present in the model, but missing from the bundle, in a "diff -u" output they would be prefixed with a minus (-) sign
    - [kubernetes-master:kube-api-endpoint kubeapi-load-balancer:apiserver]
    - [etcd:certificates easyrsa:client]
  inserted: // relations that were inserted/added to the bundle haven't made it to the model yet,  in a "diff -u" output they would be prefixed with a plus (+) sign
    - [flannel:etcd etcd:db]

Here I’m using deleted and inserted in an attempt to use the same verbs used in the diff’s documentation, but I don’t have an strong opinion about using them or not, I understand a user could get worried reading “deleted” (e.g. “where those relations deleted!?”)

https://www.gnu.org/software/diffutils/manual/diffutils.html#Comparison

We don’t need to specify the model because diff-bundle will work as a model command. By default it will operate against the current model although that can be changed using the --model option.

My intuitive feeling was that juju diff-bundle bundle.yaml would conceptually work like diff bundle.yaml model. So the bundle would be the starting point/source of truth, and differences would be represented as changes to the model. This is consistent with deploying a bundle - the bundle is adding applications into the model, so the model is missing them.

Ohhh, I think I just clicked why you would have the sense the other way - if someone’s working on the bundle, then they’ve added that relation to the bundle since they deployed it to the model. It kind of comes down to whether someone is editing the bundle or “editing” the model using juju relate or other commands.

This seems similar to a Necker cube illusion. I’d rather sidestep it and come up with a label that doesn’t assume one perspective, because it’s likely to confuse users taking the other perspective.

Does that make sense?

:slight_smile: yes, you are correct.

Which side is the source of truth will depend on what the user wants to know. I’ve heard people interested in having a way to detect if someone on the side has done changes to the environment and left their bundle out of sync, while in a CI/CD environment the user would like to know what will be changed when they run juju deploy my-bundle.yaml. So now I’m wondering if “diff-bundle” is a good name, because what this will be printing is some kind of execution plan if the user decides to run juju deploy.

while in a CI/CD environment the user would like to know what will be changed when they run juju deploy my-bundle.yaml

One concern here is that bundle deploy has a dry-run mode in order to help provide details on what would be done currently. I agree it feels very diff-like but I also am hesitant to get into the “you can do it via x, or y, or …”

Yeah, I agree with @rick_h here. That question of “what will happen if I deploy (or redeploy) this bundle to this model?” is supposed to be answered by juju deploy bundle --dry-run.

diff-bundle is more general than deploy - as well as showing what’s missing from the model (or new in the bundle ;), it will show what’s in the model that isn’t in the bundle, as well as config changes.

I agree with both of you, but I don’t have a good suggestion this time.