Monolithic interfaces

simonrichardson · 15 April 2020 15:42

This is a draft, but I believe it’s worth sharing in this form.

I believe there is an emergent anti-pattern in Juju that we need to get to grips with that is making composability and modularity harder to do the more stuff we add.

It’s best served by showing a few examples:

Interfaces found in common/networkingcommon
ModelManagerBackend in apiserver/common

These interfaces are monolithic interfaces.

They replicate state, by providing an abstraction layer for unit testing. They’re easy to create and easy to add methods to them, but they have a flaw. The boundary that you’re trying to create from your package to the state package, is now between your package and anyone else that consumes that interface. Changing that monolithic interface as consequences that breaks the Interface segregation and Liskov substitution principles. The very fact that you now have to update the code (mocks, stubs are just the tip of the iceberg) in all places that rely on the shared interfaces means there is a code smell.

The solution is quite simple but rather dramatic. We should prevent any additional methods to these interfaces. Instead, we should create self container point of use interfaces. These interfaces should be isolated to the package themselves. To be totally ideological about it, we should probably go as far as to say, that using somebody else’s interface from within Juju should be frowned upon.

This might be obvious, but I sometimes think it’s worth spelling out.

timClicks · 15 April 2020 20:24

On the topic of interfaces, John Ousterhout’s book A Philosophy of Software Design has some really good insight. Keep them tight.

This talk’s good too:

achilleasa · 16 April 2020 10:22

While this is a great pattern to apply in general there are cases where we its OK to couple to an interface from another package and other cases where you need to do this to catch issues at compile-time.

An obvious example of the first case is imports from the Go stdlib; in other words, we shouldn’t go around repeating the definition for an io.Reader to avoid coupling to the io package.

An interesting example for the latter case is the implementation for the upgradesteps facade where the package itself defines the expected interfaces and it attempts to coerce a state.Unit to that interface at run-time. So, when we tweaked the arg list of one of the state.Unit methods we accidentally caused an interface mis-match which is really hard to catch at compile-time or via tests (you need a comprehensive e2e test for this).

Long story short: the code compiled, the tests were green, issue wasn’t caught during review and controller upgrades were broken! We had to resort to coupling to the implicit interface from state.Unit to get a compile-time error next time we accidentally break things.

While I do agree with your point, what I would personally want to see would be the state package using non-exported concrete types and defining interfaces for all its exported types (also applies to types returned from methods on other types). This would essentially form a contract which would allow us to:

cleanly mock calls to state from other packages without having to create a bazillion useless shims which really serve as adaptors from concrete instances to the interface used/mocked by the test.
ensure that package-level interfaces match the contract via compile-time checks.

simonrichardson · 16 April 2020 11:07

I should have been very explicit in my original text. I was originally only thinking about internally to juju/juju.

But as always there are exceptions; like a Filesystem interface, so you’re not reliant on os.File et al, for your code.

I just don’t agree with this. It goes against consuming interfaces and exposing concrete types. One thing that will burn you if you return interfaces is that you’ll end up in shim hell.

// state/machine.go
type machine struct {}

func (m *machine) IPAddresses() ([]IPAddress, error) { ... }

type Life interface {
    IsAlive() bool
}

type IPAddress interface {
    Address() string
    Life() Life
}

To realise this in another package, you’d have to create a new interface, one that only shows what you’re using, otherwise you might as well be just Java and force implements X on everything.

type Machine interface {
    IPAddresses() ([]IPAddress, error) 
}

// You make your own IPAddress because you don't use life.
type IPAddress interface {
    Address() string
}

You’ll now need a shim to make state/machine.go conform to your machine.go. Unfortunately go isn’t covariant (the biggest thing they missed in the language imo) and if you do this enough you see that shims are everywhere.

Alternatively you could do:

type Machine interface {
    IPAddresses() ([]IPAddress, error) 
}

type IPAddress = state.IPAddress

But you’re still exposing Life for when you don’t need to and if you’re doing that, you may as well just use state.Machine, which isn’t right either.

Dependencies are yours not someone else’s, so you should manage them.

This is probably another anti-pattern, which also needs thought aside from this one.

achilleasa · 16 April 2020 19:58

In my opinion, having to shim things is a horrible pattern which makes testing overly complicated and nearly impossible to use mocks in facade code. Case in point:

A state.Machine has a Units method that gives you back a []*state.Unit.
A state.Unit has an Application method that gives you back a *state.Application.
All these objects have internal state (a mongo doc) and cannot be instantiated outside the state package.

How would you go about testing this code in a facade without a DB fixture?

func listUnitApplications(m *state.Machine){
   units, _ := m.Units()
   for _, unit := range units {
      app := unit.Application()
      logger.Info("unit %s: app %s", unit.Name(), app.Name())
   }
}

Would you implement a multi-layer shim for all these types? E.g:

Machine shim with a Units method that returns a []Unit (local package interface)
Where the entries of the above slice are shimmed units with an application method that return a shim to the actual application

Instead of having to shim everything as you suggest, you could alternatively structure your code as follows:

Define your pure interfaces in a “contract” package (this is akin to how protobuf definitions and other IDL stuff works)

// state/model
type Life interface {
    IsAlive() bool
}

type IPAddress interface {
    Address() string
    Life() Life
}

The types in state happen to implement the above interfaces:

// state/machine.go
type machine struct {}

func (m *machine) IPAddresses() ([]model.IPAddress, error) {
   var list []model.IPAddress
   for _, addr := range m.ipAddresses {
      list = append(list, addr) // addr implements model.IPAddress
   }
   return list
}

Note that you don’t have to necessarily make your state types private but they are shown as such in this example since (in most cases) you cannot instantiate them outside of the state package as they typically provide a getter/setter API on top of an underlying mongo document.

You would then arrange for the facade code to work with interfaces from the model package and use gomock for setting up your test expectations. Yes, you would be using the full interface as input… but nothing really stops you from applying the interface segregation principle and constrain the interface when passing arguments to your methods which is what you would be testing via unit tests anyway!

For example, in the above example where you don’t care about the Life method, you need to think about what your code needs to do with the IP address list and structure it accordingly:

// Subset of model.Machine; satisfied by a state.Machine
type AddressProvider interface {
  IPAddresses() []model.IPAddress
}

type AddressFormatter interface {
  Address() string
}

// Apply interface segregation principle
func (f facade) frobnicateAddresses( provider AddressProvider ){
  for _, addr := range provider.IPAddresses() {
    f.frobnicateAddress(addr)
  }
}

// Apply interface segregation principle
func (facade) frobnicateAddress( addr AddressFormatter ) {
 logger.Info("the address is %s", addr.Address()
}

This approach makes unit testing for each function using gomock trivial.

So how do you connect the model instances with your facade?

The injection happens in a third package that wires everything together (e.g. via the over-inflated concept of a context or config object) thus enforcing that everything adheres to the required interfaces at compile-time.

Your facade would always accept the minimum viable interface for doing its work (e.g. a MachineProvider for looking up model.Machine instances).

If you apply this pattern to all your facades, their implementation would be completely isolated from the persistence layer and you could invest in using mocks to write several comprehensive unit tests plus a few integration tests (which would live elsewhere) to test the full path from the facade to the database.

simonrichardson · 17 April 2020 08:19

This is clearly a bad thing as seen via networkingcommon and modelbackend and modelmanager. I can go on tbh.

Contracts aka implements an interface aka Java is just forcing others to lock to something else. The whole point of go is duck typing!
You may align to somethings, but not all. Having God objects (seen in state and forced upon you by the prior stated interfaces are a clear example of this) now prevent better composition. You’re forcing others to now either use God implementations or use wrappers (providers as you call them) to give you compositional objects. You’ve not won anything here, you’ve just implemented shims, but called it another thing!

I could concede on this, but I’m very suspect on the idea of returning interfaces from everything. You’re forcing java implements semantics at the method level instead of the consumer level and I think that’s what’s backwards.

achilleasa · 18 April 2020 07:35

Yes, but on the other hand it one could argue that it is a better strategy to lock yourself to a defined interface rather to the implicit interface of a concrete type (i.e. the types in the state package) which you are not even allowed to instantiate outside the package (is there any non-state test that actually does that without writing to the DB?)

I don’t believe that it is actually feasible to completely avoid locking to something in general for anything bigger than a few packages.

What’s more, I could argue that my proposal does not necessarily prevent better composition given that you can still use interface segregation in your facade code to limit the size of the interface that you receive as input to your package code (where that interface is defined locally in your package and is a subset of the interface exposed by the contract). In other words, you still get the benefits of duck-typing and keep your package code saner and much easier to test (with the latter being the point I am trying to get across here).

Also, the concept of providers in my example is nothing more than a limited interface for obtaining the also limited interfaces used by your package that is enforced at compile time. In other words, if your state package had a method with the following signature:

// In state
func (Whatever) FindMachine(id string) (model.Machine, error){...}

This method would (if I am not mistaken; needs verification) satisfy the provider interface defined in your facade package:

// In facade package
type Machine interface { 
  // A subset of the model.Machine methods that we need here
  Name() string
}

type MachineProvider interface{
  FindMachine(string) (Machine, error)
}

So the actual code that sets up the facade could pass the state object as to the facade (expects a MachineProvider arg in its initializer list/struct whatever) in and the compiler would ensure that it adheres to the expected contract.

Also note that this particular approach would avoid the need for an interface adaptor (which I assume would be required when implementing your proposal) in the actual code itself (quite a bit of boilerplate if you repeat this process in every package) and still allow you to auto-generate it via gomock only for your tests.

This is turning into a quite an interesting conversation
Hope others will also chime in so we can get more constructive opinions/ideas to debate!

simonrichardson · 18 April 2020 12:30

I’d like to second this, this has been great and thought provoking @achilleasa

hpidcock · 19 April 2020 23:34

This is probably just a bunch of non-coherent rambling, but I hope it makes some sense.

IMO, exported interfaces is fine, but they should be immutable after creation. To allow interfaces to remain immutable, single function interfaces are desired or single purpose interfaces (multiple functions to provide the single purpose).

type MachineFinder interface {
  FindMachine(string) (Machine, error)
}

type Namer interface { 
  Name() string
}

type Destroyer interface {
  Destroy() error
}

But this approach means we should almost never return an interface (returning single purpose interfaces is fine), but rather concrete types.
Concrete types can always fulfill multiple interfaces without breaking dependency. There is only one implementation. So that means no monolithic interfaces but rather monolithic structs.

struct Machine type {
  // ....
}

func (m *Machine) Name() string {
  // ...
}

func (m *Machine) Destroy() error {
  // ...
}

Additionally single function interfaces have the benefit of easy testing when the interface is defined alongside a function type that implements the interface.

type NamerFunc func() string
func (f NamerFunc) Name() string {
  return f()
}

func ThingThatNeedsNamer(a Namer) { }

func TestThingThatNeedsNamer(t *testing.T) {
  ThingThatNeedsNamer(NamerFunc(func() string {
    return "a"
  }))
}

You should also only accept single purpose interfaces for dependency injection. This allows a struct to export either the full dependency or depend on it privately. Additionally, it allows the dependency to be optional, and leaves it up to the object newer/constructor.

struct ExportsNamer {
  Namer
}

struct DependsOnNamer {
  deps struct {
    Namer
  }
}

But what if I want to make different implementations of a monolithic struct? You don’t, the single monolithic struct should take single function interface dependencies to implement differences. AWS, GCloud, Azure, should all return the same struct, but synthesized with different dependencies.

simonrichardson · 20 April 2020 08:21

Agreed.

I believe this is exactly what I refer to and do in practice. (see Reload Spaces and CMR import and export migrations).

hpidcock · 20 April 2020 08:44

Hah yeah I skimmed through the wall of text.

niemeyer · 23 April 2020 11:10

Hello folks,

This is a great discussion with many excellent points, and valid concerns that we need to address. I hope you don’t mind if I jump in and share an outsider perspective.

I feel specially at home in this conversation because besides working on the origins of juju (arguably the code likely looks nothing like we had back then), I also went through several phases in my testing opinions, including strict TDD, wrote one of the first mocking frameworks for Python too many years ago (python-mocker), one of the first frameworks for Go which we still use, and more recently a functional testing tool (Spread). So not only this gives me some background to this conversation, but makes me recognize many of the points in the thread as my own feelings.

And to prove that I carefully went through the whole text , here are a couple of quotes from both of you which look like key points worth raising:

This is a good high-level summary of the positions so far. Paraphrasing it, Simon argues that lots of interfaces makes the code look artificial and sometimes hard to maintain, even more when such interfaces are large, while Achilleas points out that sticking to concrete types means we can’t replace them for testing. This is the heart of it, I think, and many points were made on both ends as good rationale.

So, Simon, thanks for raising this issue with the team, and working through it in such a nice conversation. The problem, from what I can gather here, seems real and we need to find an approach to improve the situation.

Achilleas, I think your defensive reaction to Simon’s key point (the artificial and over utilization of interfaces) comes from his initial recommended solution (let’s just not change the interfaces anymore), which seems impractical.

So the question is, what could we do if we did fix the issue Simon raises, while preserving the ability to have good testing?

Here is a short rant, and I’ll start by taking away some dogma which might or might not be there, so please bear with me for a moment.

We have absolutely no reason to be dogmatic about running real code in tests. It’s okay to import real implementations into the test, and it’s okay even to touch the database in tests. Good testing is testing that catches problems, keeps the code and the tests maintainable, and that keep the project healthy through difficult times, such as when small or large refactoring takes place.

The proper test is not the one that verifies that an arbitrary function was called, or that is completely isolated from the world, or even those that are fast (happy to cover this, but not now as it’s tangential), but those that verify that the intended behavior is being preserved. That’s what will keep things from going bad. That’s why testing that does not look underneath the implementation is most often superior: the mindset of the developer is in observing aspects of the code, or the API, that are promises that need to be preserved, instead of the order in which a few functions were called and with which arguments. The behavior, not the implementation.

Although not always, mocking is often a counter example of these points: it creates code that is extremely dependent on the shape of the implementation, and verifies almost nothing of the aspects of the code that the outside world really depends upon.

Alright… enough background.

So what is the actual suggested approach for juju, given that we cannot get rid of interfaces, nor can we just use the concrete types?

What I’ve been doing for some time now is embedding the idea of testing into the implementation itself. Interestingly, this is precisely marrying the points both of you are making, allowing testing while avoiding the artificial and non-idiomatic use of interfaces.

In practice, this takes different forms depending on the case at hand. For example, it could literally mean having a method on the implementation that takes a TestingGod or whatever name you pick and allows influencing the implementation at key points that are required for testing purposes. It might also be something lighter, such as having an alternative backend in the implementation which is more suitable for testing.

It’s hard to go into too much detail without looking at a more concrete case, but I hope the overall idea is clear enough to sound sensible. If you want to explore a particular case, I’m happy to go into detail. I just hope we don’t let Simon’s key points die due to thread length.

Have a good day all.

manadart · 24 April 2020 09:47

Another illustrative example might be of help here.

@simonrichardson and I discussed this at length the other day in the context of a patch he was writing, where the logic for reload-spaces was moved out of state, and into environ.

This was done to ameliorate a big issue that Juju has, which is the amount of business logic residing in state. I’ll come back to this further down.

The encapsulation of logic required to update spaces and subnets from the provider uses (correctly) its locally defined and specific contract with state (the approach Simon is advocating here).

This issue raised it’s head because this functionality is invoked by:

Users invoking reload-spaces (calling through to the spaces client API),
Users adding a model (calling through the model manager client API) and;
Bootstrap, to seed the provider subnets/spaces in a new model.

The spaces API receives its own concrete state reference, as does bootstrap, which is easy to indirect (notwithstanding having to use shims to translate collection<type> into collection<indirection>).

The second of these is the issue because ModelManagerAPI is constructed with an indirection for state, ModelManagerBackend, which is also returned by the call to model creation. Have a look at the interface.

Note where it is used.
Note its location (apiserver/common).

We now need to shim this shim to get to our reload-spaces contract with state.

It is clear that this interface is far too big and too general. We have no flexibility to use it as a dependency to other well defined modules with specific state contracts.

In this case I agree completely with Simon.

As we are doing with the newtworkingcommon indirections and shims, these should be gradually strangled out in favour of targeted contracts, which I will describe later.

@achilleasa also makes a good point about dogmatism and the impracticality of always using a locally defined indirection.

The point we should focus on here is scope and not location. If a struct exports a strict and well targeted API, a general interface I believe, is acceptable. Where this tries to serve myriad consumer use-cases, it is not, which leads us to:

I would agree with this except that our state types do not meet the requirement. The issue we have state is types like Machine, which can return Units, which can return Addresses and on and on. We can’t describe a contract to meet the needs of all consumers.

If State was really a repository, and if no state types could access anything but their own collections, I think they might well be candidates for broad-use interfaces, but this package has grown organically in what I believe is the wrong direction.

One solution I believe is feasible and possible to implement gradually is another that Simon and I have discussed.

Some will recall an illustrative patch that I discussed in Malta (if memory serves). This was an attempt to provide cross-collection transactionality outside of state, so that entities did not have to be aware of others and their collections. Although this patch remains incomplete, the pattern has seen adoption in a few places.

I now believe it is not the best solution. What I would like to see is API facades that look like this:

Single methods or small, closely related groups of methods should be their own (sub) APIs.
These use their own, well-purposed state contracts.
They instantiate their own ModelOperations that the contracts also accomodate.
They are ambedded in the parent facade API.
The are constructed with the parent, which receives a concrete *state.State that fulfills the contracts (possibly via shims, which I agree are ugly, but are constraint imposed by Go itself).

This has multiple benefits:

We don’t grow over-general interfaces.
We can test specific functionality with greater focus, no having to satisfy general dependencies unrelated to what we are testing.
We can do it gradually.

The last is key, because we can apply it over time to huge API facades like Application that currently spin up too much infrastructure to test through the stack.

Simon used this pattern in a recent change to the Spaces API, but due to having to work with all of the constraints above, it may not best illustrate the approach. I will rework the draft patch to use it, which should serve as a reference example.