传统运维 - GPU重装

2022-04-13

https://help.aliyun.com/document_detail/163825.htm

卸载GPU驱动。

/usr/bin/nvidia-uninstall

卸载CUDA和cuDNN库。

/usr/local/cuda/bin/cuda-uninstaller
rm -rf /usr/local/cuda*

重启服务器

reboot

安装gpu驱动

#!/bin/sh

#Please input version to install
IS_INSTALL_RDMA="FALSE"
IS_INSTALL_AIACC_TRAIN="FALSE"
IS_INSTALL_AIACC_INFERENCE="FALSE"
DRIVER_VERSION="460.91.03"
CUDA_VERSION="10.2.89"
CUDNN_VERSION="7.6.5"
IS_INSTALL_RAPIDS="FALSE"

INSTALL_DIR="/root/auto_install"
rm -rf ${INSTALL_DIR}*
#using .deb to install driver and cuda on ubuntu OS
#using .run to install driver and cuda on ubuntu OS
auto_install_script="auto_install.sh"

script_download_url=$(curl http://100.100.100.200/latest/meta-data/source-address | head -1)"/opsx/ecs/linux/binary/script/${auto_install_script}"
echo $script_download_url

mkdir $INSTALL_DIR && cd $INSTALL_DIR
wget -t 10 --timeout=10 $script_download_url && sh ${INSTALL_DIR}/${auto_install_script} $DRIVER_VERSION $CUDA_VERSION $CUDNN_VERSION $IS_INSTALL_AIACC_TRAIN $IS_INSTALL_AIACC_INFERENCE $IS_INSTALL_RDMA $IS_INSTALL_RAPIDS

测试效果

nvidia-smi