TrueChartsClone/charts/system/nvidia-device-plugin/docs/installation.md

108 lines
2.9 KiB
Markdown
Raw Permalink Normal View History

Add frontmatters to charts except "stable" (#20720) **Description** <!-- Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change. --> ⚒️ Fixes # <!--(issue)--> **⚙️ Type of change** - [ ] ⚙️ Feature/App addition - [ ] 🪛 Bugfix - [ ] ⚠️ Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] 🔃 Refactor of current code **🧪 How Has This Been Tested?** <!-- Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration --> **📃 Notes:** <!-- Please enter any other relevant information here --> **✔️ Checklist:** - [ ] ⚖️ My code follows the style guidelines of this project - [ ] 👀 I have performed a self-review of my own code - [ ] #️⃣ I have commented my code, particularly in hard-to-understand areas - [ ] 📄 I have made corresponding changes to the documentation - [ ] ⚠️ My changes generate no new warnings - [ ] 🧪 I have added tests to this description that prove my fix is effective or that my feature works - [ ] ⬆️ I increased versions for any altered app according to semantic versioning - [ ] I made sure the title starts with `feat(chart-name):`, `fix(chart-name):` or `chore(chart-name):` **➕ App addition** If this PR is an app addition please make sure you have done the following. - [ ] 🖼️ I have added an icon in the Chart's root directory called `icon.png` --- _Please don't blindly check all the boxes. Read them and only check those that apply. Those checkboxes are there for the reviewer to see what is this all about and the status of this PR with a quick glance._
2024-04-13 21:42:42 +00:00
---
title: Nvidia Device Plugin Setup
---
chore(docs): fix few more (#20721) **Description** <!-- Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change. --> ⚒️ Fixes # <!--(issue)--> **⚙️ Type of change** - [ ] ⚙️ Feature/App addition - [ ] 🪛 Bugfix - [ ] ⚠️ Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] 🔃 Refactor of current code **🧪 How Has This Been Tested?** <!-- Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration --> **📃 Notes:** <!-- Please enter any other relevant information here --> **✔️ Checklist:** - [ ] ⚖️ My code follows the style guidelines of this project - [ ] 👀 I have performed a self-review of my own code - [ ] #️⃣ I have commented my code, particularly in hard-to-understand areas - [ ] 📄 I have made corresponding changes to the documentation - [ ] ⚠️ My changes generate no new warnings - [ ] 🧪 I have added tests to this description that prove my fix is effective or that my feature works - [ ] ⬆️ I increased versions for any altered app according to semantic versioning - [ ] I made sure the title starts with `feat(chart-name):`, `fix(chart-name):` or `chore(chart-name):` **➕ App addition** If this PR is an app addition please make sure you have done the following. - [ ] 🖼️ I have added an icon in the Chart's root directory called `icon.png` --- _Please don't blindly check all the boxes. Read them and only check those that apply. Those checkboxes are there for the reviewer to see what is this all about and the status of this PR with a quick glance._
2024-04-13 22:03:13 +00:00
## Talos Linux Setup
feat(nvidia-device-plugin): add nvidia-device-plugin (#20132) **Description** Hello, This adds the nvidia-device-plugin preconfigured for 5 vcpu per pgpu. ⚒️ Fixes # <!--(issue)--> **⚙️ Type of change** - [X] ⚙️ Feature/App addition - [ ] 🪛 Bugfix - [ ] ⚠️ Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] 🔃 Refactor of current code **🧪 How Has This Been Tested?** <!-- Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration --> **📃 Notes:** <!-- Please enter any other relevant information here --> **✔️ Checklist:** - [X] ⚖️ My code follows the style guidelines of this project - [X] 👀 I have performed a self-review of my own code - [X] #️⃣ I have commented my code, particularly in hard-to-understand areas - [X] 📄 I have made corresponding changes to the documentation - [ ] ⚠️ My changes generate no new warnings - [ ] 🧪 I have added tests to this description that prove my fix is effective or that my feature works - [X] ⬆️ I increased versions for any altered app according to semantic versioning - [X] I made sure the title starts with `feat(chart-name):`, `fix(chart-name):` or `chore(chart-name):` **➕ App addition** If this PR is an app addition please make sure you have done the following. - [ ] 🖼️ I have added an icon in the Chart's root directory called `icon.png` --- _Please don't blindly check all the boxes. Read them and only check those that apply. Those checkboxes are there for the reviewer to see what is this all about and the status of this PR with a quick glance._ --------- Signed-off-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Signed-off-by: Kjeld Schouten <info@kjeldschouten.nl> Co-authored-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Co-authored-by: Kjeld Schouten <info@kjeldschouten.nl>
2024-04-09 07:51:26 +00:00
## Enable NVIDIA kernel modules
feat(nvidia-device-plugin): add nvidia-device-plugin (#20132) **Description** Hello, This adds the nvidia-device-plugin preconfigured for 5 vcpu per pgpu. ⚒️ Fixes # <!--(issue)--> **⚙️ Type of change** - [X] ⚙️ Feature/App addition - [ ] 🪛 Bugfix - [ ] ⚠️ Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] 🔃 Refactor of current code **🧪 How Has This Been Tested?** <!-- Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration --> **📃 Notes:** <!-- Please enter any other relevant information here --> **✔️ Checklist:** - [X] ⚖️ My code follows the style guidelines of this project - [X] 👀 I have performed a self-review of my own code - [X] #️⃣ I have commented my code, particularly in hard-to-understand areas - [X] 📄 I have made corresponding changes to the documentation - [ ] ⚠️ My changes generate no new warnings - [ ] 🧪 I have added tests to this description that prove my fix is effective or that my feature works - [X] ⬆️ I increased versions for any altered app according to semantic versioning - [X] I made sure the title starts with `feat(chart-name):`, `fix(chart-name):` or `chore(chart-name):` **➕ App addition** If this PR is an app addition please make sure you have done the following. - [ ] 🖼️ I have added an icon in the Chart's root directory called `icon.png` --- _Please don't blindly check all the boxes. Read them and only check those that apply. Those checkboxes are there for the reviewer to see what is this all about and the status of this PR with a quick glance._ --------- Signed-off-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Signed-off-by: Kjeld Schouten <info@kjeldschouten.nl> Co-authored-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Co-authored-by: Kjeld Schouten <info@kjeldschouten.nl>
2024-04-09 07:51:26 +00:00
Before installing the device plugin, some initial steps need to be taken per
[Talos Documentation][1]. Please make sure you have installed the correct system
extensions through a combination of patches + the correct [factory image][2] for your
use case.
example gpu-worker-patch.yaml
feat(nvidia-device-plugin): add nvidia-device-plugin (#20132) **Description** Hello, This adds the nvidia-device-plugin preconfigured for 5 vcpu per pgpu. ⚒️ Fixes # <!--(issue)--> **⚙️ Type of change** - [X] ⚙️ Feature/App addition - [ ] 🪛 Bugfix - [ ] ⚠️ Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] 🔃 Refactor of current code **🧪 How Has This Been Tested?** <!-- Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration --> **📃 Notes:** <!-- Please enter any other relevant information here --> **✔️ Checklist:** - [X] ⚖️ My code follows the style guidelines of this project - [X] 👀 I have performed a self-review of my own code - [X] #️⃣ I have commented my code, particularly in hard-to-understand areas - [X] 📄 I have made corresponding changes to the documentation - [ ] ⚠️ My changes generate no new warnings - [ ] 🧪 I have added tests to this description that prove my fix is effective or that my feature works - [X] ⬆️ I increased versions for any altered app according to semantic versioning - [X] I made sure the title starts with `feat(chart-name):`, `fix(chart-name):` or `chore(chart-name):` **➕ App addition** If this PR is an app addition please make sure you have done the following. - [ ] 🖼️ I have added an icon in the Chart's root directory called `icon.png` --- _Please don't blindly check all the boxes. Read them and only check those that apply. Those checkboxes are there for the reviewer to see what is this all about and the status of this PR with a quick glance._ --------- Signed-off-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Signed-off-by: Kjeld Schouten <info@kjeldschouten.nl> Co-authored-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Co-authored-by: Kjeld Schouten <info@kjeldschouten.nl>
2024-04-09 07:51:26 +00:00
```yaml
machine:
kernel:
modules:
- name: nvidia
- name: nvidia_uvm
- name: nvidia_drm
- name: nvidia_modeset
sysctls:
net.core.bpf_jit_harden: 1
```
### Quick Sanity Check
feat(nvidia-device-plugin): add nvidia-device-plugin (#20132) **Description** Hello, This adds the nvidia-device-plugin preconfigured for 5 vcpu per pgpu. ⚒️ Fixes # <!--(issue)--> **⚙️ Type of change** - [X] ⚙️ Feature/App addition - [ ] 🪛 Bugfix - [ ] ⚠️ Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] 🔃 Refactor of current code **🧪 How Has This Been Tested?** <!-- Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration --> **📃 Notes:** <!-- Please enter any other relevant information here --> **✔️ Checklist:** - [X] ⚖️ My code follows the style guidelines of this project - [X] 👀 I have performed a self-review of my own code - [X] #️⃣ I have commented my code, particularly in hard-to-understand areas - [X] 📄 I have made corresponding changes to the documentation - [ ] ⚠️ My changes generate no new warnings - [ ] 🧪 I have added tests to this description that prove my fix is effective or that my feature works - [X] ⬆️ I increased versions for any altered app according to semantic versioning - [X] I made sure the title starts with `feat(chart-name):`, `fix(chart-name):` or `chore(chart-name):` **➕ App addition** If this PR is an app addition please make sure you have done the following. - [ ] 🖼️ I have added an icon in the Chart's root directory called `icon.png` --- _Please don't blindly check all the boxes. Read them and only check those that apply. Those checkboxes are there for the reviewer to see what is this all about and the status of this PR with a quick glance._ --------- Signed-off-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Signed-off-by: Kjeld Schouten <info@kjeldschouten.nl> Co-authored-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Co-authored-by: Kjeld Schouten <info@kjeldschouten.nl>
2024-04-09 07:51:26 +00:00
If running these commands does not produce similar output, you haven't set up base
system completely:
feat(nvidia-device-plugin): add nvidia-device-plugin (#20132) **Description** Hello, This adds the nvidia-device-plugin preconfigured for 5 vcpu per pgpu. ⚒️ Fixes # <!--(issue)--> **⚙️ Type of change** - [X] ⚙️ Feature/App addition - [ ] 🪛 Bugfix - [ ] ⚠️ Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] 🔃 Refactor of current code **🧪 How Has This Been Tested?** <!-- Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration --> **📃 Notes:** <!-- Please enter any other relevant information here --> **✔️ Checklist:** - [X] ⚖️ My code follows the style guidelines of this project - [X] 👀 I have performed a self-review of my own code - [X] #️⃣ I have commented my code, particularly in hard-to-understand areas - [X] 📄 I have made corresponding changes to the documentation - [ ] ⚠️ My changes generate no new warnings - [ ] 🧪 I have added tests to this description that prove my fix is effective or that my feature works - [X] ⬆️ I increased versions for any altered app according to semantic versioning - [X] I made sure the title starts with `feat(chart-name):`, `fix(chart-name):` or `chore(chart-name):` **➕ App addition** If this PR is an app addition please make sure you have done the following. - [ ] 🖼️ I have added an icon in the Chart's root directory called `icon.png` --- _Please don't blindly check all the boxes. Read them and only check those that apply. Those checkboxes are there for the reviewer to see what is this all about and the status of this PR with a quick glance._ --------- Signed-off-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Signed-off-by: Kjeld Schouten <info@kjeldschouten.nl> Co-authored-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Co-authored-by: Kjeld Schouten <info@kjeldschouten.nl>
2024-04-09 07:51:26 +00:00
```
talosctl read /proc/modules
nvidia_uvm 1482752 - - Live 0xffffffffc3b4e000 (PO)
nvidia_drm 73728 - - Live 0xffffffffc3b3b000 (PO)
nvidia_modeset 1290240 - - Live 0xffffffffc39dc000 (PO)
nvidia 56602624 - - Live 0xffffffffc03e0000 (PO)
talosctl get extensions
NODE NAMESPACE TYPE ID VERSION NAME VERSION
192.168.2.104 runtime ExtensionStatus 0 1 nonfree-kmod-nvidia 535.129.03-v1.6.7
192.168.2.104 runtime ExtensionStatus 2 1 nvidia-container-toolkit 535.129.03-v1.13.5
192.168.2.104 runtime ExtensionStatus 4 1 schematic a22f54cdf137d9d058e9a399adecf4bab2f3cc74b15b5bee00005811433e06b0
192.168.2.104 runtime ExtensionStatus modules.dep 1 modules.dep 6.1.82-talos
```
## Create NVIDIA runtime class:
feat(nvidia-device-plugin): add nvidia-device-plugin (#20132) **Description** Hello, This adds the nvidia-device-plugin preconfigured for 5 vcpu per pgpu. ⚒️ Fixes # <!--(issue)--> **⚙️ Type of change** - [X] ⚙️ Feature/App addition - [ ] 🪛 Bugfix - [ ] ⚠️ Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] 🔃 Refactor of current code **🧪 How Has This Been Tested?** <!-- Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration --> **📃 Notes:** <!-- Please enter any other relevant information here --> **✔️ Checklist:** - [X] ⚖️ My code follows the style guidelines of this project - [X] 👀 I have performed a self-review of my own code - [X] #️⃣ I have commented my code, particularly in hard-to-understand areas - [X] 📄 I have made corresponding changes to the documentation - [ ] ⚠️ My changes generate no new warnings - [ ] 🧪 I have added tests to this description that prove my fix is effective or that my feature works - [X] ⬆️ I increased versions for any altered app according to semantic versioning - [X] I made sure the title starts with `feat(chart-name):`, `fix(chart-name):` or `chore(chart-name):` **➕ App addition** If this PR is an app addition please make sure you have done the following. - [ ] 🖼️ I have added an icon in the Chart's root directory called `icon.png` --- _Please don't blindly check all the boxes. Read them and only check those that apply. Those checkboxes are there for the reviewer to see what is this all about and the status of this PR with a quick glance._ --------- Signed-off-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Signed-off-by: Kjeld Schouten <info@kjeldschouten.nl> Co-authored-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Co-authored-by: Kjeld Schouten <info@kjeldschouten.nl>
2024-04-09 07:51:26 +00:00
You will need to add this runtime class to pods you wish to add GPU resources to.
feat(nvidia-device-plugin): add nvidia-device-plugin (#20132) **Description** Hello, This adds the nvidia-device-plugin preconfigured for 5 vcpu per pgpu. ⚒️ Fixes # <!--(issue)--> **⚙️ Type of change** - [X] ⚙️ Feature/App addition - [ ] 🪛 Bugfix - [ ] ⚠️ Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] 🔃 Refactor of current code **🧪 How Has This Been Tested?** <!-- Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration --> **📃 Notes:** <!-- Please enter any other relevant information here --> **✔️ Checklist:** - [X] ⚖️ My code follows the style guidelines of this project - [X] 👀 I have performed a self-review of my own code - [X] #️⃣ I have commented my code, particularly in hard-to-understand areas - [X] 📄 I have made corresponding changes to the documentation - [ ] ⚠️ My changes generate no new warnings - [ ] 🧪 I have added tests to this description that prove my fix is effective or that my feature works - [X] ⬆️ I increased versions for any altered app according to semantic versioning - [X] I made sure the title starts with `feat(chart-name):`, `fix(chart-name):` or `chore(chart-name):` **➕ App addition** If this PR is an app addition please make sure you have done the following. - [ ] 🖼️ I have added an icon in the Chart's root directory called `icon.png` --- _Please don't blindly check all the boxes. Read them and only check those that apply. Those checkboxes are there for the reviewer to see what is this all about and the status of this PR with a quick glance._ --------- Signed-off-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Signed-off-by: Kjeld Schouten <info@kjeldschouten.nl> Co-authored-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Co-authored-by: Kjeld Schouten <info@kjeldschouten.nl>
2024-04-09 07:51:26 +00:00
```
cat <<EOF | kubectl apply -f -
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: nvidia
handler: nvidia
EOF
```
### Adding runtimeClass to pods with common
feat(nvidia-device-plugin): add nvidia-device-plugin (#20132) **Description** Hello, This adds the nvidia-device-plugin preconfigured for 5 vcpu per pgpu. ⚒️ Fixes # <!--(issue)--> **⚙️ Type of change** - [X] ⚙️ Feature/App addition - [ ] 🪛 Bugfix - [ ] ⚠️ Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] 🔃 Refactor of current code **🧪 How Has This Been Tested?** <!-- Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration --> **📃 Notes:** <!-- Please enter any other relevant information here --> **✔️ Checklist:** - [X] ⚖️ My code follows the style guidelines of this project - [X] 👀 I have performed a self-review of my own code - [X] #️⃣ I have commented my code, particularly in hard-to-understand areas - [X] 📄 I have made corresponding changes to the documentation - [ ] ⚠️ My changes generate no new warnings - [ ] 🧪 I have added tests to this description that prove my fix is effective or that my feature works - [X] ⬆️ I increased versions for any altered app according to semantic versioning - [X] I made sure the title starts with `feat(chart-name):`, `fix(chart-name):` or `chore(chart-name):` **➕ App addition** If this PR is an app addition please make sure you have done the following. - [ ] 🖼️ I have added an icon in the Chart's root directory called `icon.png` --- _Please don't blindly check all the boxes. Read them and only check those that apply. Those checkboxes are there for the reviewer to see what is this all about and the status of this PR with a quick glance._ --------- Signed-off-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Signed-off-by: Kjeld Schouten <info@kjeldschouten.nl> Co-authored-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Co-authored-by: Kjeld Schouten <info@kjeldschouten.nl>
2024-04-09 07:51:26 +00:00
```yaml
workload:
main:
podSpec:
runtimeClassName: "nvidia"
containers: ...
feat(nvidia-device-plugin): add nvidia-device-plugin (#20132) **Description** Hello, This adds the nvidia-device-plugin preconfigured for 5 vcpu per pgpu. ⚒️ Fixes # <!--(issue)--> **⚙️ Type of change** - [X] ⚙️ Feature/App addition - [ ] 🪛 Bugfix - [ ] ⚠️ Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] 🔃 Refactor of current code **🧪 How Has This Been Tested?** <!-- Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration --> **📃 Notes:** <!-- Please enter any other relevant information here --> **✔️ Checklist:** - [X] ⚖️ My code follows the style guidelines of this project - [X] 👀 I have performed a self-review of my own code - [X] #️⃣ I have commented my code, particularly in hard-to-understand areas - [X] 📄 I have made corresponding changes to the documentation - [ ] ⚠️ My changes generate no new warnings - [ ] 🧪 I have added tests to this description that prove my fix is effective or that my feature works - [X] ⬆️ I increased versions for any altered app according to semantic versioning - [X] I made sure the title starts with `feat(chart-name):`, `fix(chart-name):` or `chore(chart-name):` **➕ App addition** If this PR is an app addition please make sure you have done the following. - [ ] 🖼️ I have added an icon in the Chart's root directory called `icon.png` --- _Please don't blindly check all the boxes. Read them and only check those that apply. Those checkboxes are there for the reviewer to see what is this all about and the status of this PR with a quick glance._ --------- Signed-off-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Signed-off-by: Kjeld Schouten <info@kjeldschouten.nl> Co-authored-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Co-authored-by: Kjeld Schouten <info@kjeldschouten.nl>
2024-04-09 07:51:26 +00:00
```
## Create nvidia-device-plugin namespace & enable privileged podsecurity
_Note: This is only required if you want multiple GPU resources per physical GPU. If you are happy with 1 to 1 GPU to POD mapping, you can just create namespace, it won't need privileges. You will need to turn off a setting below._
feat(nvidia-device-plugin): add nvidia-device-plugin (#20132) **Description** Hello, This adds the nvidia-device-plugin preconfigured for 5 vcpu per pgpu. ⚒️ Fixes # <!--(issue)--> **⚙️ Type of change** - [X] ⚙️ Feature/App addition - [ ] 🪛 Bugfix - [ ] ⚠️ Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] 🔃 Refactor of current code **🧪 How Has This Been Tested?** <!-- Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration --> **📃 Notes:** <!-- Please enter any other relevant information here --> **✔️ Checklist:** - [X] ⚖️ My code follows the style guidelines of this project - [X] 👀 I have performed a self-review of my own code - [X] #️⃣ I have commented my code, particularly in hard-to-understand areas - [X] 📄 I have made corresponding changes to the documentation - [ ] ⚠️ My changes generate no new warnings - [ ] 🧪 I have added tests to this description that prove my fix is effective or that my feature works - [X] ⬆️ I increased versions for any altered app according to semantic versioning - [X] I made sure the title starts with `feat(chart-name):`, `fix(chart-name):` or `chore(chart-name):` **➕ App addition** If this PR is an app addition please make sure you have done the following. - [ ] 🖼️ I have added an icon in the Chart's root directory called `icon.png` --- _Please don't blindly check all the boxes. Read them and only check those that apply. Those checkboxes are there for the reviewer to see what is this all about and the status of this PR with a quick glance._ --------- Signed-off-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Signed-off-by: Kjeld Schouten <info@kjeldschouten.nl> Co-authored-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Co-authored-by: Kjeld Schouten <info@kjeldschouten.nl>
2024-04-09 07:51:26 +00:00
```
kubectl create namespace nvidia-device-plugin
kubectl label namespace nvidia-device-plugin pod-security.kubernetes.io/enforce=privileged
```
## Install nvidia-device-plugin from kubeapps
feat(nvidia-device-plugin): add nvidia-device-plugin (#20132) **Description** Hello, This adds the nvidia-device-plugin preconfigured for 5 vcpu per pgpu. ⚒️ Fixes # <!--(issue)--> **⚙️ Type of change** - [X] ⚙️ Feature/App addition - [ ] 🪛 Bugfix - [ ] ⚠️ Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] 🔃 Refactor of current code **🧪 How Has This Been Tested?** <!-- Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration --> **📃 Notes:** <!-- Please enter any other relevant information here --> **✔️ Checklist:** - [X] ⚖️ My code follows the style guidelines of this project - [X] 👀 I have performed a self-review of my own code - [X] #️⃣ I have commented my code, particularly in hard-to-understand areas - [X] 📄 I have made corresponding changes to the documentation - [ ] ⚠️ My changes generate no new warnings - [ ] 🧪 I have added tests to this description that prove my fix is effective or that my feature works - [X] ⬆️ I increased versions for any altered app according to semantic versioning - [X] I made sure the title starts with `feat(chart-name):`, `fix(chart-name):` or `chore(chart-name):` **➕ App addition** If this PR is an app addition please make sure you have done the following. - [ ] 🖼️ I have added an icon in the Chart's root directory called `icon.png` --- _Please don't blindly check all the boxes. Read them and only check those that apply. Those checkboxes are there for the reviewer to see what is this all about and the status of this PR with a quick glance._ --------- Signed-off-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Signed-off-by: Kjeld Schouten <info@kjeldschouten.nl> Co-authored-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Co-authored-by: Kjeld Schouten <info@kjeldschouten.nl>
2024-04-09 07:51:26 +00:00
There are notes in values.yaml, but the following defines how many resources are made per GPU:
feat(nvidia-device-plugin): add nvidia-device-plugin (#20132) **Description** Hello, This adds the nvidia-device-plugin preconfigured for 5 vcpu per pgpu. ⚒️ Fixes # <!--(issue)--> **⚙️ Type of change** - [X] ⚙️ Feature/App addition - [ ] 🪛 Bugfix - [ ] ⚠️ Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] 🔃 Refactor of current code **🧪 How Has This Been Tested?** <!-- Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration --> **📃 Notes:** <!-- Please enter any other relevant information here --> **✔️ Checklist:** - [X] ⚖️ My code follows the style guidelines of this project - [X] 👀 I have performed a self-review of my own code - [X] #️⃣ I have commented my code, particularly in hard-to-understand areas - [X] 📄 I have made corresponding changes to the documentation - [ ] ⚠️ My changes generate no new warnings - [ ] 🧪 I have added tests to this description that prove my fix is effective or that my feature works - [X] ⬆️ I increased versions for any altered app according to semantic versioning - [X] I made sure the title starts with `feat(chart-name):`, `fix(chart-name):` or `chore(chart-name):` **➕ App addition** If this PR is an app addition please make sure you have done the following. - [ ] 🖼️ I have added an icon in the Chart's root directory called `icon.png` --- _Please don't blindly check all the boxes. Read them and only check those that apply. Those checkboxes are there for the reviewer to see what is this all about and the status of this PR with a quick glance._ --------- Signed-off-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Signed-off-by: Kjeld Schouten <info@kjeldschouten.nl> Co-authored-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Co-authored-by: Kjeld Schouten <info@kjeldschouten.nl>
2024-04-09 07:51:26 +00:00
```yaml
resources:
- name: nvidia.com/gpu
replicas: 5
feat(nvidia-device-plugin): add nvidia-device-plugin (#20132) **Description** Hello, This adds the nvidia-device-plugin preconfigured for 5 vcpu per pgpu. ⚒️ Fixes # <!--(issue)--> **⚙️ Type of change** - [X] ⚙️ Feature/App addition - [ ] 🪛 Bugfix - [ ] ⚠️ Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] 🔃 Refactor of current code **🧪 How Has This Been Tested?** <!-- Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration --> **📃 Notes:** <!-- Please enter any other relevant information here --> **✔️ Checklist:** - [X] ⚖️ My code follows the style guidelines of this project - [X] 👀 I have performed a self-review of my own code - [X] #️⃣ I have commented my code, particularly in hard-to-understand areas - [X] 📄 I have made corresponding changes to the documentation - [ ] ⚠️ My changes generate no new warnings - [ ] 🧪 I have added tests to this description that prove my fix is effective or that my feature works - [X] ⬆️ I increased versions for any altered app according to semantic versioning - [X] I made sure the title starts with `feat(chart-name):`, `fix(chart-name):` or `chore(chart-name):` **➕ App addition** If this PR is an app addition please make sure you have done the following. - [ ] 🖼️ I have added an icon in the Chart's root directory called `icon.png` --- _Please don't blindly check all the boxes. Read them and only check those that apply. Those checkboxes are there for the reviewer to see what is this all about and the status of this PR with a quick glance._ --------- Signed-off-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Signed-off-by: Kjeld Schouten <info@kjeldschouten.nl> Co-authored-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Co-authored-by: Kjeld Schouten <info@kjeldschouten.nl>
2024-04-09 07:51:26 +00:00
```
_Note: If you do not want multigpu mapping, set replicas to 1 and change the following line to false._
feat(nvidia-device-plugin): add nvidia-device-plugin (#20132) **Description** Hello, This adds the nvidia-device-plugin preconfigured for 5 vcpu per pgpu. ⚒️ Fixes # <!--(issue)--> **⚙️ Type of change** - [X] ⚙️ Feature/App addition - [ ] 🪛 Bugfix - [ ] ⚠️ Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] 🔃 Refactor of current code **🧪 How Has This Been Tested?** <!-- Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration --> **📃 Notes:** <!-- Please enter any other relevant information here --> **✔️ Checklist:** - [X] ⚖️ My code follows the style guidelines of this project - [X] 👀 I have performed a self-review of my own code - [X] #️⃣ I have commented my code, particularly in hard-to-understand areas - [X] 📄 I have made corresponding changes to the documentation - [ ] ⚠️ My changes generate no new warnings - [ ] 🧪 I have added tests to this description that prove my fix is effective or that my feature works - [X] ⬆️ I increased versions for any altered app according to semantic versioning - [X] I made sure the title starts with `feat(chart-name):`, `fix(chart-name):` or `chore(chart-name):` **➕ App addition** If this PR is an app addition please make sure you have done the following. - [ ] 🖼️ I have added an icon in the Chart's root directory called `icon.png` --- _Please don't blindly check all the boxes. Read them and only check those that apply. Those checkboxes are there for the reviewer to see what is this all about and the status of this PR with a quick glance._ --------- Signed-off-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Signed-off-by: Kjeld Schouten <info@kjeldschouten.nl> Co-authored-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Co-authored-by: Kjeld Schouten <info@kjeldschouten.nl>
2024-04-09 07:51:26 +00:00
```yaml
gfd:
enabled: true
feat(nvidia-device-plugin): add nvidia-device-plugin (#20132) **Description** Hello, This adds the nvidia-device-plugin preconfigured for 5 vcpu per pgpu. ⚒️ Fixes # <!--(issue)--> **⚙️ Type of change** - [X] ⚙️ Feature/App addition - [ ] 🪛 Bugfix - [ ] ⚠️ Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] 🔃 Refactor of current code **🧪 How Has This Been Tested?** <!-- Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration --> **📃 Notes:** <!-- Please enter any other relevant information here --> **✔️ Checklist:** - [X] ⚖️ My code follows the style guidelines of this project - [X] 👀 I have performed a self-review of my own code - [X] #️⃣ I have commented my code, particularly in hard-to-understand areas - [X] 📄 I have made corresponding changes to the documentation - [ ] ⚠️ My changes generate no new warnings - [ ] 🧪 I have added tests to this description that prove my fix is effective or that my feature works - [X] ⬆️ I increased versions for any altered app according to semantic versioning - [X] I made sure the title starts with `feat(chart-name):`, `fix(chart-name):` or `chore(chart-name):` **➕ App addition** If this PR is an app addition please make sure you have done the following. - [ ] 🖼️ I have added an icon in the Chart's root directory called `icon.png` --- _Please don't blindly check all the boxes. Read them and only check those that apply. Those checkboxes are there for the reviewer to see what is this all about and the status of this PR with a quick glance._ --------- Signed-off-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Signed-off-by: Kjeld Schouten <info@kjeldschouten.nl> Co-authored-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Co-authored-by: Kjeld Schouten <info@kjeldschouten.nl>
2024-04-09 07:51:26 +00:00
```
### Enable GPU in values.yaml
feat(nvidia-device-plugin): add nvidia-device-plugin (#20132) **Description** Hello, This adds the nvidia-device-plugin preconfigured for 5 vcpu per pgpu. ⚒️ Fixes # <!--(issue)--> **⚙️ Type of change** - [X] ⚙️ Feature/App addition - [ ] 🪛 Bugfix - [ ] ⚠️ Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] 🔃 Refactor of current code **🧪 How Has This Been Tested?** <!-- Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration --> **📃 Notes:** <!-- Please enter any other relevant information here --> **✔️ Checklist:** - [X] ⚖️ My code follows the style guidelines of this project - [X] 👀 I have performed a self-review of my own code - [X] #️⃣ I have commented my code, particularly in hard-to-understand areas - [X] 📄 I have made corresponding changes to the documentation - [ ] ⚠️ My changes generate no new warnings - [ ] 🧪 I have added tests to this description that prove my fix is effective or that my feature works - [X] ⬆️ I increased versions for any altered app according to semantic versioning - [X] I made sure the title starts with `feat(chart-name):`, `fix(chart-name):` or `chore(chart-name):` **➕ App addition** If this PR is an app addition please make sure you have done the following. - [ ] 🖼️ I have added an icon in the Chart's root directory called `icon.png` --- _Please don't blindly check all the boxes. Read them and only check those that apply. Those checkboxes are there for the reviewer to see what is this all about and the status of this PR with a quick glance._ --------- Signed-off-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Signed-off-by: Kjeld Schouten <info@kjeldschouten.nl> Co-authored-by: bitpushr <91350598+bitpushr@users.noreply.github.com> Co-authored-by: Kjeld Schouten <info@kjeldschouten.nl>
2024-04-09 07:51:26 +00:00
```yaml
resources:
limits:
nvidia.com/gpu: 1
```
[1]: https://www.talos.dev/v1.6/talos-guides/configuration/nvidia-gpu-proprietary/
[2]: https://factory.talos.dev