본문 바로가기
Nvidia/TEST

DCGMI repository 등록 및 설치/실행(Ubuntu 22.04.3)

by ccclog 2023. 10. 18.
반응형

https://docs.nvidia.com/datacenter/dcgm/latest/user-guide/getting-started.html

 

Getting Started — NVIDIA DCGM Documentation latest documentation

Post-Install Warning On HGX systems (A100/A800 and H100/H800), you will need to install the NVIDIA Switch Configuration and Query (NSCQ) library for DCGM to enumerate the NVSwitches and provide telemetry for switches. Refer to the HGX Software Guide for mo

docs.nvidia.com

## OS버전
>>root@user:~# cat /etc/*release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"


## 설치

>>root@user:/# apt-get install libnvidia-nscq-535 

>>root@user:/# wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb

>>root@user:/# dpkg -i cuda-keyring_1.0-1_all.deb

>>root@user:/# add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /"
>>root@user:/# apt-get update
>>root@user:/# apt-get install -y datacenter-gpu-manager

## DCGMI 버전확인
>>root@user:~# dcgmi -v
Version : 3.2.6
Build ID : 29
Build Date : 2023-09-27
Build Type : Release
Commit ID : 
Branch Name : rel_dcgm_3_2
CPU Arch : x86_64

## DCGMI 실행(run level 3, 15분 소요)
>>root@user:~# dcgmi diag -v -r 3

 

 

## DCGMI group 설정

>>root@user:~#   dcgmi group -n "egg"  // egg 그룹생성
>>root@user:~#   dcgmi group -l // 그룹 생성 확인 및 그룹 아이디 확인.
>>root@user:~#   dcgmi group -g 2 -a 0,1,2,3 // "2"번 그룹에 0,1,2,3 gpu 추가  

반응형

'Nvidia > TEST' 카테고리의 다른 글

NVLINK 상태 확인  (0) 2023.09.23
GPU-BURN  (0) 2022.11.14
Nvidia-bug-report.sh  (0) 2022.11.03