add note on multi-GPU and improve accuracy

This commit is contained in:
cal-weng 2025-10-29 17:16:55 +08:00
parent d25bde12c3
commit 10ce9b44fc
2 changed files with 24 additions and 13 deletions

View file

@ -33,7 +33,8 @@ Olares supports three GPU allocation modes. Choosing the right mode helps optimi
In this mode, a GPU can be bound to multiple applications and rotates execution in time slices.
* At any instant, only one application uses all available compute and VRAM of the GPU.
* Other apps enter a wait queue; their CUDA and VRAM content will be swapped to the system memory.
* Other apps enter a wait queue; Their VRAM contents (e.g., CUDA context, etc.) may be temporarily swapped out to system memory.
* By default, GPUs run in time-slicing mode. Applications not assigned an exclusive GPU or dedicated VRAM will join the time-slicing queue when a time-slicing GPU is available.
### App Exclusive
@ -49,8 +50,10 @@ In this mode, VRAM of the GPU is partitioned into fixed quotas for multiple desi
* The sum of quotas must not exceed physical VRAM of the bound GPU. Oversubscription is not supported.
* Apps with quota assigned can run concurrently, each limited to its own quota.
:::tip Multi-GPU aggregation
You can bind multiple GPUs to one application within the same cluster to gain bigger VRAM. In such scenarios, only **App Exclusive** or **Memory Slicing** modes are supported.
:::tip Multi-GPU allocation
- All three allcation modes support assigning multiple GPUs to the same application. Olares only assigns multiple GPUs to the applications container without fusing VRAM or compute in any way. Whether multi-GPU is utilized depends on the application/framework itself.
- In multi-node environments, you can't assign multiple GPUs across nodes to the same application simultaneously.
:::
## View GPU status
@ -76,9 +79,9 @@ On the **GPU details** page, select your desired mode from the **GPU mode** drop
![Time slicing](/images/manual/olares/gpu-time-slicing.png#bordered)
:::tip Note
No manual pinning is required if you only have one GPU in your cluster.
:::
:::tip Note
No manual binding is required if you only have one GPU in your cluster.
:::
* **App Exclusive**
1. Select this mode from the GPU mode dropdown.
@ -93,5 +96,10 @@ No manual pinning is required if you only have one GPU in your cluster.
4. Repeat for other applications and click **Confirm**.
![VRAM slicing](/images/manual/olares/gpu-memory-slicing.png#bordered)
:::tip Unbinding GPU allocation
After binding GPUs to an application, you can release GPU resources by performing an unbind operation under the corresponding GPU mode.
:::
## Learn more
- [Monitor GPU usage in Olares](../resources-usage.md)

View file

@ -32,7 +32,8 @@ Olares 提供三种分配方式,可按场景灵活选择。
在此模式下,单张显卡按时间分片分配给多个应用。
- 任一时刻仅一个应用占用全部算力与可用显存。
- 其余应用进入等待队列,其 CUDA 及显存内容被换出至系统内存。
- 其余应用进入等待队列,其显存内容(如 CUDA 上下文等)可被临时换出至系统内存。
- 显卡默认处于时间分配模式。未被分配独占 GPU 或专属显存的应用,将默认加入时间分片队列(当有可用的时间分片显卡时)。
### 应用独占模式
@ -48,8 +49,9 @@ Olares 提供三种分配方式,可按场景灵活选择。
- 各配额之和不得超过对应显卡的物理显存。(暂不支持超订阅)
- 获配额的应用可并行运行,且仅能使用自身配额。
:::tip 多显卡合并
在同一集群中,可将多张显卡绑定至同一应用以获取更大显存和算力;合并场景下仅支持应用独占或显存分片模式。
:::tip 多显卡分配
- 三种模式均支持为同一应用分配多张显卡。Olares 仅将多张显卡分配到应用所在的容器,不做显存/算力的融合;能否利用多卡取决于应用/框架本身。
- 在多节点环境中,同一应用不可跨节点同时分配多张显卡。
:::
## 查看显卡状态
@ -76,15 +78,16 @@ Olares 提供三种分配方式,可按场景灵活选择。
2. 在**选择独占应用**下拉框中选择目标应用。
3. 点击**确认**。
![独占](/images/zh/manual/olares/gpu-app-exclusive.png#bordered)
- **显存分片**
* **显存分片**
1. 在下拉菜单中选择该模式。
2. 在**分配显存**窗口,点击 **+ 添加应用**。
3. 选择目标应用,并指定分配给该应用的显存大小(以 GB 为单位)。
4. 如需为其他应用分配显存,可重复以上操作,然后点击**确认**。
![显存分片](/images/zh/manual/olares/gpu-memory-slicing.png#bordered)
::: tip 注意
分配的显存必须小于显卡总显存。
:::
:::tip 解除绑定
绑定应用后,如需释放显卡资源,可在相应的显卡模式下执行解绑操作。
:::
## 了解更多
- [监控 Olares 中的显卡使用情况](../resources-usage.md)