add multiple cards for one app support and update GPU modes description

2026-05-24 09:18:23 +00:00 · 2025-10-27 15:30:45 +08:00 · 2025-10-27 15:30:45 +08:00 · d25bde12c3
commit d25bde12c3
parent 5a434b5b50
4 changed files with 58 additions and 42 deletions
--- a/docs/manual/olares/settings/gpu-resource.md
+++ b/docs/manual/olares/settings/gpu-resource.md
@ -12,12 +12,13 @@ Olares allows you to harness the full power of your GPUs to accelerate demanding
 This guide helps you understand and configure GPU allocation modes to maximize hardware performance.

 ::: tip GPU support
-Olares supports **only Nvidia GPUs** of **Turing architecture or later** (Turing, Ampere, Ada Lovelace, and Blackwell). 
+Olares supports **only Nvidia GPUs** of **Turing architecture or later** (Turing, Ampere, Ada Lovelace, and Blackwell).

 - Quick check: GTX/RTX **16 series and newer** consumer cards are supported.
+- For other models, cross-check with the [compatible GPU table](https://github.com/NVIDIA/open-gpu-kernel-modules?tab=readme-ov-file#compatible-gpus).
 - Other models: Cross-check with the [compatible GPU table](https://github.com/NVIDIA/open-gpu-kernel-modules?tab=readme-ov-file#compatible-gpus).
 - Unknown model: Run `lspci | grep -i nvidia` to query the GPU architecture code and determine compatibility.  
-:::
+  :::

 :::warning AI Performance
 Even if your GPU architecture is supported, **low VRAM capacity may cause AI applications to fail**. Ensure your GPU has enough memory for your workloads.
@ -27,23 +28,30 @@ Even if your GPU architecture is supported, **low VRAM capacity may cause AI app

 Olares supports three GPU allocation modes. Choosing the right mode helps optimize performance based on your needs.

-### App Exclusive
-
-In this mode, the GPU’s full compute capacity and VRAM are allocated to a single application to ensure the maximized performance.
-
-### Memory Slicing
-
-In this mode, GPU VRAM is allocated to multiple applications by specified VRAM quotas:
-
- Applications with assigned VRAM can run concurrently on the GPU.
- The sum of all assigned VRAMs must not exceed the GPU’s physical VRAM.
-
 ### Time Slicing

-In this mode, any number of applications can be bound to the same GPU:
+In this mode, a GPU can be bound to multiple applications and rotates execution in time slices.

- At any instant, only one application fully occupies the GPU’s compute and VRAM.
- VRAM contents of other applications are temporarily swapped out to system memory.
+* At any instant, only one application uses all available compute and VRAM of the GPU.
+* Other apps enter a wait queue; their CUDA and VRAM content will be swapped to the system memory.
+
+### App Exclusive
+
+In this mode, the entire GPU is allocated to a single application.
+
+* During execution, the app can use all compute and VRAM of the bound GPU.
+* No cross-app contention or scheduling overhead so that best performance is guaranteed.
+
+### Memory Slicing
+In this mode, VRAM of the GPU is partitioned into fixed quotas for multiple designated applications.
+
+* Users need to manually set a quota for each app.
+* The sum of quotas must not exceed physical VRAM of the bound GPU. Oversubscription is not supported.
+* Apps with quota assigned can run concurrently, each limited to its own quota.
+
+:::tip Multi-GPU aggregation
+You can bind multiple GPUs to one application within the same cluster to gain bigger VRAM. In such scenarios, only **App Exclusive** or **Memory Slicing** modes are supported.
+:::

 ## View GPU status

@ -52,8 +60,10 @@ To view your GPU status:
 1. Navigate to **Settings** > **GPU**. The GPU list shows each GPU’s model, associated node, total VRAM, and current GPU mode.
 2. Click on a specific GPU to visit its details.

+![GPU overview](/images/manual/olares/gpu-overview.png#bordered)
+
 ::: tip Note
-If your Olares only has one GPU, navigating to the GPU section will take you directly to the GPU details page. If you have multiple GPUs, you will see a list first.
+If your Olares only has one GPU, navigating to the GPU section will take you directly to the GPU details page.
 :::

 ## Configure GPU mode
@ -69,23 +79,19 @@ On the **GPU details** page, select your desired mode from the **GPU mode** drop
 :::tip Note
 No manual pinning is required if you only have one GPU in your cluster.
 :::
-  
+
 * **App Exclusive**
  1. Select this mode from the GPU mode dropdown.
  2. In the **Select exclusive app** dropbox, choose your target application.
  3. Click **Confirm**.
-    ![App exclusive](/images/manual/olares/gpu-app-exclusive.png#bordered)
+     ![App exclusive](/images/manual/olares/gpu-app-exclusive.png#bordered)

-  * **Memory Slicing**
-      1. Select this mode from the dropdown.
-      2. In the **Allocate VRAM** section, click **Add an application**. 
-      3. Select your target application and assign it a specific amount of VRAM (in GB).
-      4. Repeat for other applications and click **Confirm**.
-         ![VRAM slicing](/images/manual/olares/gpu-memory-slicing.png#bordered)
-     
-    ::: tip Note
-    You can't assign a VRAM that's larger than the total VRAM.
-    :::
+* **Memory Slicing**
+  1. Select this mode from the dropdown.
+  2. In the **Allocate VRAM** section, click **Add an application**.
+  3. Select your target application and assign it a specific amount of VRAM in GB.
+  4. Repeat for other applications and click **Confirm**.
+     ![VRAM slicing](/images/manual/olares/gpu-memory-slicing.png#bordered)

 ## Learn more
 - [Monitor GPU usage in Olares](../resources-usage.md)
--- a/docs/public/images/manual/olares/gpu-overview.png
+++ b/docs/public/images/manual/olares/gpu-overview.png
--- a/docs/public/images/zh/manual/olares/gpu-overview.png
+++ b/docs/public/images/zh/manual/olares/gpu-overview.png
--- a/docs/zh/manual/olares/settings/gpu-resource.md
+++ b/docs/zh/manual/olares/settings/gpu-resource.md
@ -28,27 +28,37 @@ Olares 仅支持 **NVIDIA 显卡**，且要求架构为 **Turing 或更新**（T

 Olares 提供三种分配方式，可按场景灵活选择。

-### 应用独占模式
-
-在此模式下，单张 GPU 的算力和显存将分配给一个应用，以保证最佳性能。
-
-### 显存分片模式
-
-在此模式下，GPU 显存可按指定显存分配给多个应用。
- 所有获得显存的应用可同时使用 GPU。
- 所分配显存之和不得超过总物理显存。
-
 ### 时间分片模式

-在此模式下，任意数量应用可绑定至同一 GPU：
- 任一时刻仅有一个应用完全占用 GPU 算力和显存。
- 此时其他应用的显存内容会暂时换出至系统内存。
+在此模式下，单张显卡按时间分片分配给多个应用。
+- 任一时刻仅一个应用占用全部算力与可用显存。
+- 其余应用进入等待队列，其 CUDA 及显存内容被换出至系统内存。
+  
+### 应用独占模式
+
+在此模式下，每张显卡的计算能力和显存将分配至单个应用。
+
+- 应用在运行时可使用显卡全部的算力和显存。
+- 在这个模式下运行的应用会获得最佳性能。
+
+### 显存分片模式
+在此模式下，每张显卡的显存被划分为固定配额，分配给多个指定应用。
+
+- 需为每个应用手动设定配额。
+- 各配额之和不得超过对应显卡的物理显存。（暂不支持超订阅）
+- 获配额的应用可并行运行，且仅能使用自身配额。
+
+:::tip 多显卡合并
+在同一集群中，可将多张显卡绑定至同一应用以获取更大显存和算力；合并场景下仅支持应用独占或显存分片模式。
+:::

 ## 查看显卡状态

 1. 进入 **设置 > GPU**。GPU 列表显示每个显卡的型号、所在节点、总显存及当前分配模式。
 2. 点击单个显卡以进入其详情页。

+![GPU 概览](/images/zh/manual/olares/gpu-overview.png#bordered)
+
 ::: tip 注意
 如果你的 Olares 集群中只有一块 GPU，进入 GPU 页面将直接跳转至详情页；若有多块 GPU，则会显示 GPU 列表。
 :::