I’ve deployed CDK as a bundle to a MAAS cloud. Deployment went well, but then after grabbing the kubeconfig and starting to interact with the cluster, I noticed all is not well. Initially having trouble accessing the dashboard, I’ve found that several core services are crashlooping. (Edited for actual issue in post #2.)
I feel like I’m missing something trivial here, but the mire of K8s hoops I’ve jumped through to reach this point are making it confusing to trace back to the core problem. Please can someone offer any pointers?
EDIT: so the heapster deployment is crashlooping:
I1226 10:41:15.694885 1 heapster.go:79] /heapster --source=kubernetes.summary_api:https://kubernetes.default?kubeletPort=10250&kubeletHttps=true --sink=influxdb:http://monitoring-influxdb:8086
I1226 10:41:15.694985 1 heapster.go:80] Heapster version v1.6.0-beta.1
I1226 10:41:15.695452 1 configs.go:61] Using Kubernetes client with master "https://kubernetes.default" and version v1
I1226 10:41:15.695488 1 configs.go:62] Using kubelet port 10250
I1226 10:41:15.796324 1 influxdb.go:312] created influxdb sink with options: host:monitoring-influxdb:8086 user:root db:k8s
I1226 10:41:15.796380 1 heapster.go:203] Starting with InfluxDB Sink
I1226 10:41:15.796396 1 heapster.go:203] Starting with Metric Sink
I1226 10:41:16.298315 1 heapster.go:113] Starting heapster on port 8082
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1724540]
goroutine 108 [running]:
k8s.io/heapster/metrics/sources/summary.(*summaryMetricsSource).decodeEphemeralStorageStatsForContainer(0xc4202e2f00, 0xc420769490, 0xc42069d4f0, 0xc42069d540)
/go/src/k8s.io/heapster/metrics/sources/summary/summary.go:276 +0x90
k8s.io/heapster/metrics/sources/summary.(*summaryMetricsSource).decodeContainerStats(0xc4202e2f00, 0xc420b52930, 0xc420af9a68, 0x0, 0x298ce20)
/go/src/k8s.io/heapster/metrics/sources/summary/summary.go:238 +0x537
k8s.io/heapster/metrics/sources/summary.(*summaryMetricsSource).decodePodStats(0xc4202e2f00, 0xc420b526c0, 0xc420b526f0, 0xc420af9bd8)
/go/src/k8s.io/heapster/metrics/sources/summary/summary.go:212 +0x91e
k8s.io/heapster/metrics/sources/summary.(*summaryMetricsSource).decodeSummary(0xc4202e2f00, 0xc4201d3500, 0x0)
/go/src/k8s.io/heapster/metrics/sources/summary/summary.go:132 +0x351
k8s.io/heapster/metrics/sources/summary.(*summaryMetricsSource).ScrapeMetrics(0xc4202e2f00, 0xed3b551bc, 0xc400000000, 0x296aa60, 0xed3b551f8, 0xc400000000, 0x296aa60, 0x296c008, 0x0, 0xc42000e913)
/go/src/k8s.io/heapster/metrics/sources/summary/summary.go:102 +0x116
k8s.io/heapster/metrics/sources.scrape(0x27bdb00, 0xc4202e2f00, 0xed3b551bc, 0xc400000000, 0x296aa60, 0xed3b551f8, 0x0, 0x296aa60, 0x0, 0x0, ...)
/go/src/k8s.io/heapster/metrics/sources/manager.go:176 +0x11f
k8s.io/heapster/metrics/sources.(*sourceManager).ScrapeMetrics.func1(0xc420272990, 0x27bdb00, 0xc4202e2f00, 0xc420268960, 0xed3b551bc, 0x0, 0x296aa60, 0xed3b551f8, 0x0, 0x296aa60, ...)
/go/src/k8s.io/heapster/metrics/sources/manager.go:99 +0x155
created by k8s.io/heapster/metrics/sources.(*sourceManager).ScrapeMetrics
/go/src/k8s.io/heapster/metrics/sources/manager.go:120 +0x387
I don’t see any logentries that hint to what’s up
The Juju output looks like this:
seffyroff@Ame:~$ juju status
Model Controller Cloud/Region Version SLA Timestamp
base maas maas 2.5-rc1 unsupported 02:58:53-08:00
App Version Status Scale Charm Store Rev OS Notes
ceph-fs 13.2.1+dfsg1 active 1 ceph-fs jujucharms 36 ubuntu
ceph-mon 13.2.1+dfsg1 active 3 ceph-mon jujucharms 354 ubuntu
ceph-osd 13.2.1+dfsg1 active 4 ceph-osd jujucharms 380 ubuntu
easyrsa 3.0.1 active 1 easyrsa jujucharms 199 ubuntu
etcd 3.2.10 active 3 etcd jujucharms 352 ubuntu
flannel 0.10.0 active 5 flannel jujucharms 360 ubuntu
kubeapi-load-balancer 1.14.0 active 1 kubeapi-load-balancer jujucharms 538 ubuntu exposed
kubernetes-master 1.13.1 active 2 kubernetes-master jujucharms 542 ubuntu
kubernetes-worker 1.13.1 active 3 kubernetes-worker jujucharms 414 ubuntu exposed
Unit Workload Agent Machine Public address Ports Message
ceph-fs/0* active idle 2/lxd/1 10.0.10.219 Unit is ready (1 MDS)
ceph-mon/0 active idle 0/lxd/0 10.0.10.221 Unit is ready and clustered
ceph-mon/1* active idle 1/lxd/0 10.0.10.220 Unit is ready and clustered
ceph-mon/2 active idle 2/lxd/0 10.0.10.214 Unit is ready and clustered
ceph-osd/0 active idle 2 10.0.10.217 Unit is ready (1 OSD)
ceph-osd/1* active idle 1 10.0.10.222 Unit is ready (1 OSD)
ceph-osd/2 active idle 0 10.0.10.218 Unit is ready (1 OSD)
ceph-osd/3 active idle 3 10.0.10.211 Unit is ready (1 OSD)
easyrsa/0* active idle 0/lxd/1 10.0.10.223 Certificate Authority connected.
etcd/0 active idle 0/lxd/2 10.0.10.215 2379/tcp Healthy with 3 known peers
etcd/1 active idle 1/lxd/1 10.0.10.216 2379/tcp Healthy with 3 known peers
etcd/2* active idle 2/lxd/2 10.0.10.213 2379/tcp Healthy with 3 known peers
kubeapi-load-balancer/0* active idle 3/lxd/0 10.0.10.212 443/tcp Loadbalancer ready.
kubernetes-master/0* active idle 2 10.0.10.217 6443/tcp Kubernetes master running.
flannel/1 active idle 10.0.10.217 Flannel subnet 10.1.64.1/24
kubernetes-master/1 active idle 0 10.0.10.218 6443/tcp Kubernetes master running.
flannel/4 active idle 10.0.10.218 Flannel subnet 10.1.26.1/24
kubernetes-worker/0 active idle 1 10.0.10.222 80/tcp,443/tcp Kubernetes worker running.
flannel/0* active idle 10.0.10.222 Flannel subnet 10.1.85.1/24
kubernetes-worker/1* active idle 0 10.0.10.218 80/tcp,443/tcp Kubernetes worker running.
flannel/3 active idle 10.0.10.218 Flannel subnet 10.1.26.1/24
kubernetes-worker/2 active idle 2 10.0.10.217 80/tcp,443/tcp Kubernetes worker running.
flannel/2 active idle 10.0.10.217 Flannel subnet 10.1.64.1/24
Machine State DNS Inst id Series AZ Message
0 started 10.0.10.218 m3yhap bionic default Deployed
0/lxd/0 started 10.0.10.221 juju-c0ce59-0-lxd-0 bionic default Container started
0/lxd/1 started 10.0.10.223 juju-c0ce59-0-lxd-1 bionic default Container started
0/lxd/2 started 10.0.10.215 juju-c0ce59-0-lxd-2 bionic default Container started
1 started 10.0.10.222 ywywnt bionic default Deployed
1/lxd/0 started 10.0.10.220 juju-c0ce59-1-lxd-0 bionic default Container started
1/lxd/1 started 10.0.10.216 juju-c0ce59-1-lxd-1 bionic default Container started
2 started 10.0.10.217 n4wbbg bionic default Deployed
2/lxd/0 started 10.0.10.214 juju-c0ce59-2-lxd-0 bionic default Container started
2/lxd/1 started 10.0.10.219 juju-c0ce59-2-lxd-1 bionic default Container started
2/lxd/2 started 10.0.10.213 juju-c0ce59-2-lxd-2 bionic default Container started
3 started 10.0.10.211 g43qk3 bionic default Deployed
3/lxd/0 started 10.0.10.212 juju-c0ce59-3-lxd-0 bionic default Container started
Taking all other variables out of the equation, I deployed kubernetes-core from charm bundle directly to a MAAS controller, using default placement and configs, and Heapster still segfaults upon deployment. I imagine something has changed in the stack somewhere upstream to mess this around, but I welcome being proven wrong on that. Reading upstream docs it appears that Heapster itself is deprecated and no longer supported going forwards?
Hmm, wonder if @tvansteenburgh or @kos.tsakalozos have any insight into this.
Hi @seffyroff. You’re right, Heapster is not supported by upstream any more, having been replaced by metrics-server (which we do ship in CDK). But the Kubernetes default dashboard still relies on Heapster. The dashboard will switch to using metrics-server eventually (hopefully soon), but until then, if you want the dashboard, you need Heapster. All that to say, if you don’t need/want the dashboard, you can juju config kubernetes-master enable-dashboard-addons=false
and that will get rid of the dashboard and heapster pods. If you want to keep the dashboard you could try changing the heapster pod to use the older 1.5.4 container image. From looking at the source it doesn’t appear to have the bug you’re hitting.
Thanks for the clarification, I appreciate your giving up time to reply.