TrueChartsClone/charts/stable/nvidia-gpu-exporter/values.yaml

90 lines
2.1 KiB
YAML
Raw Normal View History

feat(nvidia-gpu-exporter) add nvidia-gpu-exporter (#17289) **Description** Prometheus exporter for Nvidia GPU's using nvidia-smi binary to gather metrics. ⚒️ Fixes # <!--(issue)--> **⚙️ Type of change** - [x] ⚙️ Feature/App addition - [ ] 🪛 Bugfix - [ ] ⚠️ Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] 🔃 Refactor of current code **🧪 How Has This Been Tested?** Tested on local Kubernetes cluster using Nvidia Quadro M2000 card. **📃 Notes:** <!-- Please enter any other relevant information here --> **✔️ Checklist:** - [x] ⚖️ My code follows the style guidelines of this project - [x] 👀 I have performed a self-review of my own code - [ ] #️⃣ I have commented my code, particularly in hard-to-understand areas - [ ] 📄 I have made corresponding changes to the documentation - [ ] ⚠️ My changes generate no new warnings - [ ] 🧪 I have added tests to this description that prove my fix is effective or that my feature works - [ ] ⬆️ I increased versions for any altered app according to semantic versioning **➕ App addition** If this PR is an app addition please make sure you have done the following. - [ ] 🪞 I have opened a PR on [truecharts/containers](https://github.com/truecharts/containers) adding the container to TrueCharts mirror repo. - [x] 🖼️ I have added an icon in the Chart's root directory called `icon.png` --- _Please don't blindly check all the boxes. Read them and only check those that apply. Those checkboxes are there for the reviewer to see what is this all about and the status of this PR with a quick glance._
2024-02-21 16:34:43 +00:00
image:
repository: utkuozdemir/nvidia_gpu_exporter
pullPolicy: IfNotPresent
tag: 1.2.0@sha256:cc407f77ab017101ce233a0185875ebc75d2a0911381741b20ad91f695e488c7
securityContext:
container:
privileged: true
readOnlyRootFilesystem: false
runAsUser: 0
runAsGroup: 0
service:
main:
ports:
main:
protocol: http
port: 9835
workload:
main:
type: DaemonSet
podSpec:
containers:
main:
args:
- --web.listen-address
- :{{ .Values.service.main.ports.main.port }}
- --web.telemetry-path
- "{{ .Values.metricsEndpoint }}"
- --nvidia-smi-command
- nvidia-smi
- --log.level
- "{{ .Values.logs.general.level }}"
- --log.format
- "{{ .Values.logs.general.format }}"
probes:
liveness:
path: "{{ .Values.metricsEndpoint }}"
port: main
readiness:
path: "{{ .Values.metricsEndpoint }}"
port: main
startup:
type: tcp
port: main
persistence:
nviaictl:
enabled: true
type: hostPath
hostPath: /dev/nvidiactl
mountPath: /dev/nvidiactl
readOnly: true
nvidia0:
enabled: true
type: hostPath
hostPath: /dev/nvidia0
mountPath: /dev/nvidia0
readOnly: true
nvidiasmi:
enabled: true
type: hostPath
hostPath: /usr/bin/nvidia-smi
mountPath: /usr/bin/nvidia-smi
readOnly: true
libnvidiamlso:
enabled: true
type: hostPath
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so
mountPath: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so
readOnly: true
libnvidiamlso1:
enabled: true
type: hostPath
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
mountPath: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
readOnly: true
metrics:
main:
enabled: true
type: "servicemonitor"
endpoints:
- port: main
path: "{{ .Values.metricsEndpoint }}"
portal:
open:
enabled: false
metricsEndpoint: "/metrics"
logs:
general:
level: info
format: logfmt