Bootstrap fails before controller is ready

I’m bootstrapping a controller in an openstack cluster. The bootstrap basically works, but it bombs out towards the end, when it comes to confirming controller API accessibility.

15:23:04 INFO  cmd bootstrap.go:653 Bootstrap agent now started
15:23:05 DEBUG juju.provider.openstack provider.go:498 instance 5c9bb826-5ace-4d89-9b35-6ace2d3d5a96 has floating IP address: 172.16.2.180
15:23:05 INFO  juju.juju api.go:328 API endpoints changed from [] to [172.16.2.180:17070 10.99.2.199:17070]
15:23:05 INFO  cmd controller.go:93 Contacting Juju controller at 172.16.2.180 to verify accessibility...
15:23:05 INFO  juju.juju api.go:78 connecting to API addresses: [172.16.2.180:17070 10.99.2.199:17070]
15:23:09 INFO  cmd controller.go:135 Still waiting for API to become available: unable to connect to API: dial tcp 172.16.2.180:17070: connect: connection refused
15:23:09 INFO  juju.juju api.go:78 connecting to API addresses: [172.16.2.180:17070 10.99.2.199:17070]
15:23:13 ERROR juju.cmd.juju.commands bootstrap.go:881 unable to contact api server after 2 attempts: unable to connect to API: dial tcp 10.99.2.199:17070: connect: no route to host

Ignore the non-routing 10.99.2.199 address, that will never work.

The connection refused from 172.16.2.180 is because the API service is not started yet. If I bootstrap with --keep-broken, port 17070/tcp becomes open about 10 seconds after the bootstrap has failed.

The config bootstrap-retry-delay apparently only changes the delay for the SSH part of the bootstrap, not the controller reachability test.

Did I miss something?

Usually juju retries more than 2 times. What time did you start the bootstrap? Perhaps juju is timing out and has a bad error message when timeout hits here. The default should be 18 minutes. You can change it with the bootstrap-timeout config key.

Thanks for the suggestion. Actually the number of retries seems to vary but the process is always within the timeout limit. I ran it again and timed it (see below), under 4 minutes but it only retried once this time apparently.

If I use ^Z at the point I see the message ‘Bootstrap agent now started’, wait a few seconds for 17070/tcp to open and then resume the bootstrap, I get a model. However there are further reliability issues if I try to deploy to that model - it ends up with units randomly stuck in “allocating”, somehow like having two IP addresses for the same controller is causing problems. If I bootstrap from an instance inside the openstack project, everything works normally.

09:27:40 INFO  cmd bootstrap.go:481 Running machine configuration script...
09:29:53 INFO  cmd bootstrap.go:653 Bootstrap agent now started
09:29:55 DEBUG juju.provider.openstack provider.go:498 instance 60f078f3-0843-46b8-ae60-d14ffe939990 has floating IP address: 172.16.2.180
09:29:55 INFO  juju.juju api.go:328 API endpoints changed from [] to [172.16.2.180:17070 10.99.3.108:17070]
09:29:55 INFO  cmd controller.go:93 Contacting Juju controller at 172.16.2.180 to verify accessibility...
09:29:55 INFO  juju.juju api.go:78 connecting to API addresses: [172.16.2.180:17070 10.99.3.108:17070]
09:29:59 ERROR juju.cmd.juju.commands bootstrap.go:881 unable to contact api server after 1 attempts: unable to connect to API: dial tcp 10.99.3.108:17070: connect: no route to host
09:29:59 DEBUG juju.cmd.juju.commands bootstrap.go:882 (error details: [{/build/snapcraft-juju-35d6cf/parts/juju/src/cmd/juju/common/controller.go:143: unable to contact api server after 1 attempts} {/build/snapcraft-juju-35d6cf/parts/juju/src/cmd/juju/common/controller.go:49: } {/build/snapcraft-juju-35d6cf/parts/juju/src/cmd/modelcmd/modelcommand.go:420: } {/build/snapcraft-juju-35d6cf/parts/juju/src/cmd/modelcmd/modelcommand.go:439: } {/build/snapcraft-juju-35d6cf/parts/juju/src/cmd/modelcmd/base.go:270: } {/build/snapcraft-juju-35d6cf/parts/juju/src/juju/api.go:83: } {/build/snapcraft-juju-35d6cf/parts/juju/src/api/apiclient.go:223: } {/build/snapcraft-juju-35d6cf/parts/juju/src/api/apiclient.go:669: } {/build/snapcraft-juju-35d6cf/parts/juju/src/api/apiclient.go:1015: } {/build/snapcraft-juju-35d6cf/parts/juju/src/api/apiclient.go:1119: unable to connect to API} {/build/snapcraft-juju-35d6cf/parts/juju/src/api/apiclient.go:1144: } {/build/snapcraft-juju-35d6cf/parts/juju/src/api/apiclient.go:724: } {dial tcp 10.99.3.108:17070: connect: no route to host}])
09:29:59 DEBUG juju.cmd.juju.commands bootstrap.go:1632 cleaning up after failed bootstrap
09:29:59 INFO  juju.provider.common destroy.go:21 destroying model "controller"
09:29:59 INFO  juju.provider.common destroy.go:32 destroying instances
09:30:02 DEBUG juju.provider.openstack provider.go:1697 terminating instances [60f078f3-0843-46b8-ae60-d14ffe939990]
09:30:03 DEBUG juju.provider.openstack firewaller.go:192 deleting security group "juju-b6d22d33-379d-4f3e-8e19-fd69ee25a025-6fed0980-7ac1-469a-8515-bd522bb31269-0"
09:30:05 INFO  juju.provider.common destroy.go:56 destroying storage
09:30:05 DEBUG juju.provider.openstack cinder.go:119 volume URL: https://172.16.1.6:8776/v3/a101733b8a224c39a3168a72c9097e5c
09:30:06 DEBUG juju.provider.openstack firewaller.go:192 deleting security group "juju-b6d22d33-379d-4f3e-8e19-fd69ee25a025-6fed0980-7ac1-469a-8515-bd522bb31269"
09:30:08 INFO  cmd supercommand.go:544 command finished

real    3m32.732s
user    0m9.002s
sys     0m0.784s