Best Graphics Cards for Charmed Kubernetes with GPUs

db0west · 20 March 2020 14:31

Hi, I want to add GPUs to my Juju deployments of Charmed Kubernetes.

Where is the full list of supported hardware, and what are the specs, and recommendations people have?

Thanks!

Folding@Home FTW

tvansteenburgh · 20 March 2020 21:21

Pretty much any NVIDIA gpu will work. We test on Tesla K80s and V100s, but GTX and RTX cards will work too, if this is for a home cluster.

db0west · 30 August 2020 03:48

I got a GTX 1650. I put it in and now I think the next step is to get passthrough working from the KVM host to the VMs? My pool of resources for juju when I spin up these clusters comes from a 64 Core 64 GB RAM Poweredge that I have running as a KVM host with MAAS. I opened the chassis on the Poweredge today and slotted the card in. The lights went on but I got a Plug N Play error on boot that I ignored. Hopefully that doesn’t mean the card is not working.

How do I make sure this is working on the host? Do I ssh in and lspci, etc?

Also do I have to use the “Charmed” deployment or can I use “K8s Core” just as successfully?

Update… ssh’d in

 ubuntu@loved-prawn:~$ lspci | grep NV
43:00.0 VGA compatible controller: NVIDIA Corporation Device 1f82 (rev a1)

Does that mean the card is working on the host? How do I test it further?

ubuntu@loved-prawn:~$ sudo lspci -v -s 43:00.0
43:00.0 VGA compatible controller: NVIDIA Corporation Device 1f82 (rev a1) (prog-if 00 [VGA controller])
        Subsystem: ASUSTeK Computer Inc. Device 86b9
        Flags: fast devsel, IRQ 42
        Memory at fb000000 (32-bit, non-prefetchable) [size=16M]
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Memory at de000000 (64-bit, prefetchable) [size=32M]
        I/O ports at fc80 [size=128]
        Expansion ROM at fa000000 [disabled] [size=512K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Legacy Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [250] Latency Tolerance Reporting
        Capabilities: [258] L1 PM Substates
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [420] Advanced Error Reporting
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] #19
        Capabilities: [bb0] #15
        Kernel modules: nvidiafb, nouveau

more updates. Had to enable “SR_IOV Global Enable” in the PowerEdge BIOS under integrations, apparently that is how you get VT-D going on these older Dell Systems (fingers crossed). Also, I’,
currently trying to enable passthrough with:

dw@maas-effect:~/src/homelab/maas_tags$ cat gtx1650_tag.sh 
#!/bin/bash
maas dw tags create name=gtx-1650 \
     comment="Enable passthrough for gtx1650 GPUs 
              on Intel" \
     definition='
         //node[@id="cpu:0"]/capabilities/capability/@id = "vmx" 
        and //node[@id="display"]/vendor[contains(.,"NVIDIA")] 
        and //node[@id="display"]/description[contains(.,"VGA")] 
        and //node[@id="display"]/product[contains(.,"NVIDIA")]' \
     kernel_opts="console=tty0 console=ttyS0,115200n8r nomodeset 
          modprobe.blacklist=nouveau,nvidiafb,snd_hda_intel 
          nouveau.blacklist=1 nouveau.blacklist=1 
          nouveau.blacklist=1 video=vesafb:off,efifb:off 
          intel_iommu=on rd.driver.pre=pci-stub 
          rd.driver.pre=vfio-pci pci-stub.ids=10de:1f82
          vfio-pci.ids=10de:1f82 
          vfio_iommu_type1.allow_unsafe_interrupts=1
          vfio-pci.disable_vga=1"

db0west · 4 September 2020 19:33

This did not work…