It’s now the second time that I’m having this weird issue, the first time it was on a disposable environment (with a more recent version of Juju) so I just recreated it from scratch, but now it’s in a production environment so I don’t have as much luck.
I have 3 EC2 instances as machines created by Juju with 6 LXD containers (2 per servers).
When trying to connect through TCP to some AWS services (Like ElasticSearch or ElastiCache) I get no content replies from the servers, but the connection gets established successfully and it started happening after a reboot, it was working perfectly before that reboot. I do not have this issue with any other machines on the network or online. Now, 1 of those machine haven’t been rebooted yet, but is not having the issue so it makes me wonder if a recent update could have brought the issue.
Everything works fine if the connection is established directly from the machine rather than from inside the containers.
Juju version: 2.4.3
Ubuntu version on all machines and containers: 18.04
The only weird thing I’ve noticed is how they all have 30+ veth interfaces, but I fail to see a link since all 3 machines have those. Could I delete all veth interfaces safely / would LXD recreate the ones it needs?
The connection is set to never timeout in the Redis configuration, and when I run time strace -f -e network netcat {{redacted-hostname}} 6379 <<< INFO or when I connect to it using telnet, I get no data replies, but the connection is maintained without error for 10+ minutes. The results from the strace are exactly the same on the working and non-working servers, with the only exception that I actually get output on the working server after.
I found them ! They are in the default profile for some reasons. I wiped all the eth* devices from the default profile, although it didn’t fix my issue.
We’ve had to change the fan-252 device MTUs to 8951 on the host machine, otherwise we’d still have issues. The working machine MTU was set to 8951 already. The MTU was set at 1450 on both non-working hosts.
Now we’re looking at a way to enforce this setting and we’re not sure where to look at.