Random "ERROR connection is shut down" error while running juju commands

Hello,

We have an EKS cluster (Kubernetes version: 1.28), in which we have bootstrapped a Juju controller (3.1.5), and in which we have some models and charms deployed.

We have some periodic GitHub actions which logs in into the controller and performs a few updates, checks, and so on. The issue is that, sometimes, the juju commands end up with an ERROR connection is shut down error, and a non-zero status code. This error also occurs for commands like juju status --relations, and the command finishes, the issue seems to occur on closing the connection.

Here are some log snippets from juju status --relations --debug --verbose:

2024-07-03T11:41:27.4487854Z + juju status --relations --debug --verbose
2024-07-03T11:41:27.5338505Z 11:41:27 INFO  juju.cmd supercommand.go:56 running juju [3.1.8 810900f47952a1f3835576f57dce2f9d1aef23d0 gc go1.21.9]
2024-07-03T11:41:27.5342066Z 11:41:27 DEBUG juju.cmd supercommand.go:57   args: []string{"/snap/juju/26977/bin/juju", "status", "--relations", "--debug", "--verbose"}
2024-07-03T11:41:27.5345760Z 11:41:27 INFO  juju.juju api.go:86 connecting to API addresses: [10.100.228.195:17070]
2024-07-03T11:41:27.9827896Z 11:41:27 INFO  juju.kubernetes.klog klog.go:113 Use tokens from the TokenRequest API or manually created secret-based tokens instead of auto-generated secret-based tokens.%!(EXTRA []interface {}=[])
2024-07-03T11:41:28.9002118Z 11:41:28 DEBUG juju.api apiclient.go:645 starting proxier for connection
2024-07-03T11:41:28.9004838Z 11:41:28 DEBUG juju.api apiclient.go:649 tunnel proxy in use at localhost on port 38727
2024-07-03T11:41:29.0017354Z 11:41:29 DEBUG juju.api apiclient.go:825 looked up localhost -> [::1 127.0.0.1]
2024-07-03T11:41:29.5101652Z 11:41:29 DEBUG juju.api apiclient.go:1172 successfully dialed "wss://localhost:38727/model/994a2b25-4ea2-40d0-8940-f51454152132/api"
2024-07-03T11:41:29.5104359Z 11:41:29 INFO  juju.api apiclient.go:707 connection established to "wss://localhost:38727/model/994a2b25-4ea2-40d0-8940-f51454152132/api"
2024-07-03T11:41:29.5556091Z 11:41:29 DEBUG juju.api apiclient.go:1172 successfully dialed "wss://localhost:38727/model/994a2b25-4ea2-40d0-8940-f51454152132/api"
2024-07-03T11:41:29.7061434Z 11:41:29 DEBUG juju.api monitor.go:35 RPC connection died
2024-07-03T11:41:30.0073827Z 11:41:30 DEBUG juju.rpc server.go:328 error closing codec: tls: failed to send closeNotify alert (but connection was closed anyway): write tcp 127.0.0.1:39278->127.0.0.1:38727: write: broken pipe
2024-07-03T11:41:30.0076187Z ERROR connection is shut down
2024-07-03T11:41:30.0077583Z 11:41:30 DEBUG cmd supercommand.go:528 error stack: 
2024-07-03T11:41:30.0079180Z github.com/juju/juju/rpc.init:14: connection is shut down
2024-07-03T11:41:30.0080738Z github.com/juju/juju/rpc.(*Conn).Call:178: 
2024-07-03T11:41:30.0082580Z github.com/juju/juju/cmd/juju/status.(*statusCommand).runStatus:315: 

Interestingly, this error does not occur at all in a local environment. Tried to replicate the error locally on Ubuntu 20.04 and 22.04 VMs, using juju 3.0, 3.1, 3.2, 3.3 clients, and tried different GitHub action runner versions and juju client versions.

May also be worth mentioning, is that this issue did not occur prior to the start of November 2023, from what we’ve seen in the action history.

Any ideas on this type of error, why it occurs and how to fix it? Or, since it says that “but connection was closed anyway”, is it safe to ignore. And if so, should the juju client exit with a non-zero status code?

Kind regards,

Claudiu

This is best filed as a bug on Launchpad.