诸神缄默不语-个人CSDN博文目录
本文介绍:
- 在Python深度学习代码运行的过程中,如何设置GPU卡号(包括PyTorch和TensorFlow适用的写法),主要适用于单卡场景,以后可能会增加多卡场景。
常见适用场景:在多卡机子上,代码往往默认适用GPU 0,但有时需要使用1、2等其他GPU,因此需要手动设置。 - 如何用Linux命令行查看当前cuda占用情况
- 正在建设:显存优化
文章目录
- 1. 在深度学习中设置GPU卡号
- 1. CUDA_VISIBLE_DEVICES
- 2. PyTorch直接转移张量的device
- 2. 用Linux命令行查看当前cuda情况
- 3. 显存优化
- 本文撰写过程中使用的参考资料
1. 在深度学习中设置GPU卡号
1. CUDA_VISIBLE_DEVICES
设置之后,代码运行时就仅能看到这个被设置的GPU序号。如宏观逻辑号为1的GPU,设置后,代码运行时cuda:0
就会直接将逻辑号为0的GPU定位到真实的1卡上。
就可以直接这么写:device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
- 在代码中设置(注意需要写在深度学习代码之前):
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
- 在代码运行命令之前设置:
CUDA_VISIBLE_DEVICES=1 python run.py
多卡的写法就是:CUDA_VISIBLE_DEVICES=0,3,7 python run.py
2. PyTorch直接转移张量的device
device="cuda:1"
(数字就是GPU序号)
一般来说输入就直接把每个张量都to(device)
模型中,已经注册好的张量,可以直接通过将模型实例to(device)
就自动实现转换;而模型中未注册的张量(如在forward()
等函数中新建的、辅助模型实现更多操作的张量)
2. 用Linux命令行查看当前cuda情况
nvidia-smi
:查看GPU运行情况
输出示例:
Mon Jul 24 12:17:46 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
| 64% 76C P2 332W / 350W | 5349MiB / 24576MiB | 69% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... On | 00000000:25:00.0 Off | N/A |
| 76% 66C P2 309W / 350W | 4775MiB / 24576MiB | 99% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
omit
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3323021 C python 5346MiB |
| 1 N/A N/A 3360508 C python 4772MiB |
omit
+-----------------------------------------------------------------------------+
- peci1/nvidia-htop: A tool for enriching the output of nvidia-smi.:相比
nvidia-smi
能输出更多信息,如进程用户、运行命令等
安装方式:pip install nvidia-htop
运行代码:nvidia-htop.py
(如果加上-l
将不限制运行命令长度(意思是一行能输出多少就输出多少,也不是能完整输出运行命令的意思),-c
就能用红-黄-绿指示当前GPU占用量)
输出示例:
Mon Jul 24 12:22:34 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
| 62% 77C P2 338W / 350W | 5349MiB / 24576MiB | 99% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... On | 00000000:25:00.0 Off | N/A |
| 70% 62C P2 312W / 350W | 4775MiB / 24576MiB | 99% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
omit
+-------------------------------------------------------------------------------+
| GPU PID USER GPU MEM %CPU %MEM TIME COMMAND |
| 0 3323021 omit 5346MiB 111 0.5 04:16:09 python -u omit |
| 1 3360508 omit 4772MiB 115 0.1 03:30:58 python -u omit |
omit
+-------------------------------------------------------------------------------+
- XuehaiPan/nvitop: An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.:卡掉了一张,nvidia-smi命令用不了了,这个也还可以用。
输出其实应该是彩色的,但是服务器上卡太多了截图截不全,所以我还是只复制文字了:- 查看所有设备的状态:
nvitop -1
Thu Aug 03 15:07:47 2023 ╒═════════════════════════════════════════════════════════════════════════════╕ │ NVITOP 1.2.0 Driver Version: 520.61.05 CUDA Driver Version: 11.8 │ ├───────────────────────────────┬──────────────────────┬──────────────────────┤ │ GPU Name Persistence-M│ Bus-Id Disp.A │ Volatile Uncorr. ECC │ │ Fan Temp Perf Pwr:Usage/Cap│ Memory-Usage │ GPU-Util Compute M. │ ╞═══════════════════════════════╪══════════════════════╪══════════════════════╪══════════════════════════════════════════════════════════════════════════════════════════════════╕ │ 0 GeForce RTX 3090 On │ 00000000:01:00.0 Off │ N/A │ MEM: ███████████████▏ 17.4% │ │ 37% 50C P2 117W / 350W │ 4278MiB / 24.00GiB │ 0% Default │ UTL: ▏ 0% │ ├───────────────────────────────┼──────────────────────┼──────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────┤ │ 1 GeForce RTX 3090 On │ 00000000:25:00.0 Off │ N/A │ MEM: █████████████████████████████████████████▋ 47.9% │ │ 41% 45C P2 106W / 350W │ 11760MiB / 24.00GiB │ 0% Default │ UTL: ▏ 0% │ ╘═══════════════════════════════╧══════════════════════╧══════════════════════╧══════════════════════════════════════════════════════════════════════════════════════════════════╛ [ CPU: █████████████████▌ 13.7% UPTIME: 9.0 days ] ( Load Average: 34.97 59.87 81.07 ) [ MEM: ████████████████████████████████▎ 25.2% USED: 185.4GiB ] [ SWP: █▋ 7.3% ] ╒════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╕ │ Processes: wanghuijuan@zju │ │ GPU PID USER GPU-MEM %SM %CPU %MEM TIME COMMAND │ ╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡ │ 0 4083851 C user1 4274MiB 0 102.2 0.0 20:47:34 Zombie Process │ ├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ 1 900475 C user2 11756MiB 1 103.7 2.4 46:16:48 python run.py │ ╘════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╛
- 查看所有设备的状态:
3. 显存优化
- 调用gc:
import gc
del obj
gc.collect()
- 使用PyTorch时,如果不需要积累任何梯度(一般就是测试时),可以使用
with torch.no_grad()
(在循环语句里面正常运算即可),可以有效降低梯度占据的内存(梯度可以占相当大的一部分)。
如果仅不需要积累特定张量的梯度,可以将对应张量的requires_grad
属性置False。(这是没有注册的张量的默认属性)
(注意,仅使用model.eval()
不能达到这个效果) - 使用PyTorch清cuda上的缓存:
torch.cuda.empty_cache()
(官方文档:torch.cuda.empty_cache — PyTorch 1.11.0 documentation,此外可参考【pytorch】torch.cuda.empty_cache()==>释放缓存分配器当前持有的且未占用的缓存显存_马鹏森的博客-CSDN博客_empty_cache) - 还没看的内容:
- 官方笔记 CUDA semantics – Memory management
- 科普帖:深度学习中GPU和显存分析 – 知乎
- Transformer性能优化:运算和显存 – 知乎
本文撰写过程中使用的参考资料
- 使用CUDA_VISIBLE_DEVICES设置显卡_华科附小第一名的博客-CSDN博客_cuda_visible_devices
文章出处登录后可见!
已经登录?立即刷新