Hi, I want to add GPUs to my Juju deployments of Charmed Kubernetes.
Where is the full list of supported hardware, and what are the specs, and recommendations people have?
Thanks!
Folding@Home FTW
Hi, I want to add GPUs to my Juju deployments of Charmed Kubernetes.
Where is the full list of supported hardware, and what are the specs, and recommendations people have?
Thanks!
Folding@Home FTW
Pretty much any NVIDIA gpu will work. We test on Tesla K80s and V100s, but GTX and RTX cards will work too, if this is for a home cluster.
I got a GTX 1650. I put it in and now I think the next step is to get passthrough working from the KVM host to the VMs? My pool of resources for juju when I spin up these clusters comes from a 64 Core 64 GB RAM Poweredge that I have running as a KVM host with MAAS. I opened the chassis on the Poweredge today and slotted the card in. The lights went on but I got a Plug N Play error on boot that I ignored. Hopefully that doesn’t mean the card is not working.
How do I make sure this is working on the host? Do I ssh in and lspci, etc?
Also do I have to use the “Charmed” deployment or can I use “K8s Core” just as successfully?
Update… ssh’d in
ubuntu@loved-prawn:~$ lspci | grep NV
43:00.0 VGA compatible controller: NVIDIA Corporation Device 1f82 (rev a1)
Does that mean the card is working on the host? How do I test it further?
ubuntu@loved-prawn:~$ sudo lspci -v -s 43:00.0
43:00.0 VGA compatible controller: NVIDIA Corporation Device 1f82 (rev a1) (prog-if 00 [VGA controller])
Subsystem: ASUSTeK Computer Inc. Device 86b9
Flags: fast devsel, IRQ 42
Memory at fb000000 (32-bit, non-prefetchable) [size=16M]
Memory at e0000000 (64-bit, prefetchable) [size=256M]
Memory at de000000 (64-bit, prefetchable) [size=32M]
I/O ports at fc80 [size=128]
Expansion ROM at fa000000 [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] #19
Capabilities: [bb0] #15
Kernel modules: nvidiafb, nouveau
more updates. Had to enable “SR_IOV Global Enable” in the PowerEdge BIOS under integrations, apparently that is how you get VT-D going on these older Dell Systems (fingers crossed). Also, I’,
currently trying to enable passthrough with:
dw@maas-effect:~/src/homelab/maas_tags$ cat gtx1650_tag.sh
#!/bin/bash
maas dw tags create name=gtx-1650 \
comment="Enable passthrough for gtx1650 GPUs
on Intel" \
definition='
//node[@id="cpu:0"]/capabilities/capability/@id = "vmx"
and //node[@id="display"]/vendor[contains(.,"NVIDIA")]
and //node[@id="display"]/description[contains(.,"VGA")]
and //node[@id="display"]/product[contains(.,"NVIDIA")]' \
kernel_opts="console=tty0 console=ttyS0,115200n8r nomodeset
modprobe.blacklist=nouveau,nvidiafb,snd_hda_intel
nouveau.blacklist=1 nouveau.blacklist=1
nouveau.blacklist=1 video=vesafb:off,efifb:off
intel_iommu=on rd.driver.pre=pci-stub
rd.driver.pre=vfio-pci pci-stub.ids=10de:1f82
vfio-pci.ids=10de:1f82
vfio_iommu_type1.allow_unsafe_interrupts=1
vfio-pci.disable_vga=1"
This did not work…