I have a model that’s been trying to destroy itself for a few days now. It hangs on “model not empty” because one application is in the state “waiting for machine”.
Is there a way to force-destroy this model?
model:
name: cot-dev2
type: iaas
controller: vmware-main
cloud: vmware1
region: ILABT
version: 2.3.3
model-status:
current: destroying
message: 'attempt 1 to destroy model failed (will retry): model not empty, found
1 application (model not empty)'
since: 19 Sep 2018 14:18:20+02:00
meter-status:
color: amber
message: user verification pending
sla: unsupported
machines: {}
applications:
nginx-api-gateway:
charm: local:xenial/nginx-api-gateway-0
series: xenial
os: ubuntu
charm-origin: local
charm-name: nginx-api-gateway
charm-rev: 0
exposed: false
life: dying
application-status:
current: waiting
message: waiting for machine
since: 18 Feb 2018 21:56:24+01:00
endpoint-bindings:
upstream: ""
website: ""
controller:
timestamp: 10:50:08+02:00
For quite some time we have thought about having a destroy-model --force for those situations where things get stuck when they shouldn’t.
One of the underlying bugs here I think is that the application destruction shouldn’t be blocked by the waiting for machine in this situation (and probably a few others).
This seems a lot like lp:1654928, which claims to be fixed. So maybe this is a different issue.
I believe we’ve talked about having a “remove-application --force” of some sort. That said, I think we can just fix this one. If an application has a unit that has no machine available for it, we should just allow that unit to be removed and the application killed.
The reason we want to remove some of these models is exactly because of this “stuck” application. We hoped removing the model would fix it. So remove-application --force would actually help us even more, since it means we don’t have to remove the model anymore.
Is there a workaround I can use to remove these applications and models currently, while waiting for the fix? I’d like to see if that reduces the network traffic.
All the “stuck” applications have had or provided cross-model relations. Below is a yaml status of another model that fails to destroy. Here you can still see some of the remnants of cross-model relations.
juju status --format yaml
model:
name: providenceplus
type: iaas
controller: vmware-main
cloud: vmware1
region: ILABT
version: 2.3.8
model-status:
current: destroying
message: 'attempt 39 to destroy model failed (will retry): model not empty, found
2 applications (model not empty)'
since: 20 Sep 2018 10:31:00+02:00
meter-status:
color: amber
message: user verification pending
sla: unsupported
machines: {}
applications:
kafka-rest:
charm: local:xenial/kafka-rest-confluent-k8s-3
series: xenial
os: ubuntu
charm-origin: local
charm-name: kafka-rest-confluent-k8s
charm-rev: 3
exposed: false
life: dying
application-status:
current: waiting
message: waiting for machine
since: 20 Jun 2018 09:34:30+02:00
relations:
kubernetes:
- leggo
endpoint-bindings:
kafka: ""
kubernetes: ""
upstream: ""
kafka-rest-k8s:
charm: local:xenial/kafka-rest-confluent-k8s-0
series: xenial
os: ubuntu
charm-origin: local
charm-name: kafka-rest-confluent-k8s
charm-rev: 0
exposed: false
life: dying
application-status:
current: waiting
message: waiting for machine
since: 05 Jun 2018 15:51:19+02:00
relations:
kubernetes:
- deve
endpoint-bindings:
kafka: ""
kubernetes: ""
upstream: ""
application-endpoints:
deve:
url: vmware-main:sborny/sborny-tutorial.deve
endpoints:
kubernetes-deployer:
interface: kubernetes-deployer
role: provider
life: dying
application-status:
current: error
message: 'cannot get discharge from "https://10.10.139.74:17070/offeraccess":
cannot acquire discharge: cannot http POST to "https://10.10.139.74:17070/offeraccess/discharge":
Post https://10.10.139.74:17070/offeraccess/discharge: net/http: TLS handshake
timeout'
since: 10 Aug 2018 07:05:59+02:00
relations:
kubernetes-deployer:
- kafka-rest-k8s
leggo:
url: vmware-main:sborny/sborny-tutorial.leggo
endpoints:
kubernetes-deployer:
interface: kubernetes-deployer
role: provider
life: dying
application-status:
current: active
message: Ready
since: 11 Sep 2018 12:44:30+02:00
relations:
kubernetes-deployer:
- kafka-rest
controller:
timestamp: 10:54:24+02:00
I have a sleu of models in this state across my 3 controllers and experiencing this in 20+ of my JAAS models. @uros-jovanovic is looking into my JAAS models, possibly he has some input here.
Did anybody find any resolution to this? Am getting the same thing for models with cross model relations. Exactly the same status output as @merlijn-sebrechts
Whether due to a stale cross model relation, or a unit currently in a hook error state, or a cloud API error, or a number of other reasons, removing applications can become stuck if the “do everything properly” workflow is not possible. We are this cycle going to address the issue by:
remove-application --force
remove-unit --force
destroy-model --continue-on-error
Unfortunately the fix is still in progress so there’s nothing easy that can be done right now to solve the issue.
How will this work with existing models that can’t upgrade to a newer version (because they’re destroying or in an error state). Will we still be able to use these commands to remove those models?
Will destroy-model --continue-on-error be available in 2.6? I’m trying 2.6 beta1 now and don’t have it available.
I have a similar situation to the above, where I did a juju destroy-model. The application was removed. Both juju status and juju list-models shows 0 machines in the model but running juju destroy-model tries forever to remove a machine and application.
@aisrael,
We are providing ‘–force’ option on many commands including ‘destroy-model’, ‘remove-application’, ‘remove-relation’, ‘remove-unit’ specifically to un-stuck stuck removals and destructions in 2.6.
It will help your case. We are testing scenarios very similar to what you are describing - a model destruction is stuck on removing application and running ‘remove-application --force’ in a separate window allows the model destruction to proceed and to succeed. Obviously, with ‘–force’ on ‘destroy-model’ itself, you will not need a separate command in a different window.
I am not too sure where you are getting ‘continue-on-error’ from… Maybe it was discussed or planned at some stage? We have decided to settle on ‘–force’ as it is more intuitive and consistent with our existing terminology.
The continue-on-error was mentioned above by @wallyworld in reference to destroying a model.
The situation I’ve found is that the application is removed, but the model is still stuck in destroying. It’s like there’s an internal state out of sync. juju status on the model shows no applications are deployed, but destroy-model reports that it’s trying to remove a machine and application.
At that point, there’s no visible application for me to force the removal of, which is why I asked about forcing the model destruction as well.