hey juju people
I have a question about juju and VRFs, currently I have a cloud-init script which configures a VRF for the controller subnet, this works fine and as expected in my lab - but when configuring by cloud-init a somewhat expected condition is reached, let me first provide the context:
I have this setup with my VRF:
'
' +-------------+ +-----------------+
' | | | |
' | | | |
' | | | |
' | | | |
' | controller | | node 0 |
' | | | +----------- |
' | | | |ip exec vrf jujud
' | | | | | |
' | | | +-+----------+ |
' | | | |vrf: mgmt |
' +-----+-------+ +-+----+---------++
' | +--+-+ |
' | | |
' | | |
' | vlan 32 | |
' +-------v--------------------------------------------------------------->-----------+-----------+
' +-----------------------------------------------------------------------------------+-----------+
' |
' vlan 33 |
' +----------------------------------------------------------------------------------->-----------+
' +-----------------------------------------------------------------------------------------------+
'
'
'
' ++
' ++
I’ve tested this in my lab post juju add-machine, this works fine, and even manages to discovery/map spaces correctly, the problem I have is doing this setup in a cloud-init script.
Currently my script will do these steps during machine add (juju add-machine ssh:ubuntu@${HOST}
):
- Manupulate the netplan, adding a VRF
- Edit the jujud service file,
- Edit the sshd service file (both to run on the VRF)
- netplan apply
- systemctl daemon-reload and restart the jujud/sshd units <- this is where I hit the problem
As you might already suspect at this point, 5. causes an error:
remove with:
ssh-keygen -f "/home/ubuntu/.ssh/known_hosts" -R "10.10.32.24"
Keyboard-interactive authentication is disabled to avoid man-in-the-middle attacks.
UpdateHostkeys is disabled because the host key is not trusted.
client_loop: send disconnect: Connection reset by peer
ERROR provisioning failed, removing machine 8: subprocess encountered error code 255
ERROR error cleaning up machine: <nil>
ERROR subprocess encountered error code 255
Is there some way to prevent this with manual machines? This error is obviously expected, as the controller thinks the manual machine add failed.
Thanks for the help, Peter