mirror of
https://github.com/beclab/Olares
synced 2026-05-24 09:18:23 +00:00
resolve comments
This commit is contained in:
parent
d8db9c458c
commit
5a434b5b50
2 changed files with 20 additions and 21 deletions
|
|
@ -27,23 +27,23 @@ Even if your GPU architecture is supported, **low VRAM capacity may cause AI app
|
|||
|
||||
Olares supports three GPU allocation modes. Choosing the right mode helps optimize performance based on your needs.
|
||||
|
||||
### Time Slicing
|
||||
|
||||
In this mode, the GPU rotates across applications in time slices.
|
||||
- At any instant, only one application uses the GPU’s compute and VRAM.
|
||||
- Other applications’ VRAM contents are temporarily swapped out to system memory.
|
||||
- Applications not assigned an exclusive GPU or dedicated VRAM are placed in the time-slicing queue by default.
|
||||
|
||||
### App Exclusive
|
||||
|
||||
In this mode, the GPU’s full compute capacity and VRAM are allocated to a single application to ensure the maximized performance.
|
||||
|
||||
### Memory Slicing
|
||||
|
||||
In this mode, GPU VRAM is allocated to multiple applications by specified quotas:
|
||||
In this mode, GPU VRAM is allocated to multiple applications by specified VRAM quotas:
|
||||
|
||||
- Applications with a quota can run concurrently on the GPU.
|
||||
- The sum of all quotas must not exceed the GPU’s physical VRAM.
|
||||
- Applications with assigned VRAM can run concurrently on the GPU.
|
||||
- The sum of all assigned VRAMs must not exceed the GPU’s physical VRAM.
|
||||
|
||||
### Time Slicing
|
||||
|
||||
In this mode, any number of applications can be bound to the same GPU:
|
||||
|
||||
- At any instant, only one application fully occupies the GPU’s compute and VRAM.
|
||||
- VRAM contents of other applications are temporarily swapped out to system memory.
|
||||
|
||||
## View GPU status
|
||||
|
||||
|
|
|
|||
|
|
@ -28,22 +28,21 @@ Olares 仅支持 **NVIDIA 显卡**,且要求架构为 **Turing 或更新**(T
|
|||
|
||||
Olares 提供三种分配方式,可按场景灵活选择。
|
||||
|
||||
### 时间分片模式
|
||||
|
||||
在此模式下,多应用按时间片轮转共享同一 GPU:
|
||||
- 任一时刻仅有一个应用占用 GPU 算力和显存。
|
||||
- 此时其他应用的显存内容会暂时换出至系统内存。
|
||||
- 未被分配独占 GPU 或专有显存的应用将默认加入时间分片队列。
|
||||
|
||||
### 应用独占模式
|
||||
|
||||
在此模式下,整个 GPU 的算力和显存将分配给单个应用,以保证最佳性能。
|
||||
在此模式下,单张 GPU 的算力和显存将分配给一个应用,以保证最佳性能。
|
||||
|
||||
### 显存分片模式
|
||||
|
||||
在此模式下,GPU 显存可按指定配额分配给多个应用。
|
||||
- 获得配额的应用可并行使用 GPU。
|
||||
- 所分配配额之和不得超过总物理显存。
|
||||
在此模式下,GPU 显存可按指定显存分配给多个应用。
|
||||
- 所有获得显存的应用可同时使用 GPU。
|
||||
- 所分配显存之和不得超过总物理显存。
|
||||
|
||||
### 时间分片模式
|
||||
|
||||
在此模式下,任意数量应用可绑定至同一 GPU:
|
||||
- 任一时刻仅有一个应用完全占用 GPU 算力和显存。
|
||||
- 此时其他应用的显存内容会暂时换出至系统内存。
|
||||
|
||||
## 查看显卡状态
|
||||
|
||||
|
|
|
|||
Loading…
Reference in a new issue