Controller of openstack has been bootstrapped but the status of machine remains pending

colinzhan · 20 September 2020 08:56

After bootstrapping controller for Openstack cloud successfully (juju status returned ok), we then tried to ‘add-machine’, but the status of machine remained pending and no new instances created in Openstack.

We checked ‘machine-0.log’ in the controller instance and found some errors there:

2020-09-20 05:05:12 ERROR juju.worker.dependency engine.go:671 “broker-tracker” manifold worker returned unexpected error: no container types determined

2020-09-20 05:05:22 ERROR juju.core.raftlease store.go:265 timeout waiting for Command(ver: 1, op: claim, ns: singular-controller, model: dfa45c, lease: dfa45c42-cd34-47f2-86cd-bcd130d34ac4, holder: machine-0) to be processed

The timeout errors occurred repeatedly in the log.

Status of controller was also weird. When we called juju from client, the command usually got stuck, then we checked the port in controller and no port was bound at that time, but after a while, a binding of 17070 on specific ip appeared and the client command completed.

We tried the process(bootstrap, add-machine, waiting without progress and destroy controller) many times, and the results were almost always the same. But oddly enough, in one of those attempts, a corresponding machine was created.

Any suggestions?

brad-marshall · 6 October 2020 22:09

Hi,

I’m seeing something similar with the same error message about timeout waiting for command, but I also get a different error message when the machines are trying to download the agent binaries from the controller:

2020-10-06 17:08:40 WARNING juju.worker.httpserver log.go:181 http: TLS handshake error from x.y.z.253:60758: EOF

MAAS is telling me the machines deployed ok, and the console just shows it as timing out trying to download the agent binaries.

FWIW I’m trying to deploy using Focal and Juju 2.8.4 when I’m getting these errors.

I’ve just filed Bug #1898802 “Jujud timing out when trying to download agent bin...” : Bugs : juju regarding this error, hopefully something can be done.

I’m still investigating, but if you came up with anything I’d be interested. I’ll update here if I find anything.

Thanks,
Brad

brad-marshall · 7 October 2020 04:39

This ended up being a mismatched MTU between the juju controller VM and the underlying host for me, nothing to do with juju or anything else.