diff --git a/docs/manual/olares/settings/gpu-resource.md b/docs/manual/olares/settings/gpu-resource.md index 3076eaaeb..81e32e748 100644 --- a/docs/manual/olares/settings/gpu-resource.md +++ b/docs/manual/olares/settings/gpu-resource.md @@ -12,12 +12,13 @@ Olares allows you to harness the full power of your GPUs to accelerate demanding This guide helps you understand and configure GPU allocation modes to maximize hardware performance. ::: tip GPU support -Olares supports **only Nvidia GPUs** of **Turing architecture or later** (Turing, Ampere, Ada Lovelace, and Blackwell). +Olares supports **only Nvidia GPUs** of **Turing architecture or later** (Turing, Ampere, Ada Lovelace, and Blackwell). - Quick check: GTX/RTX **16 series and newer** consumer cards are supported. +- For other models, cross-check with the [compatible GPU table](https://github.com/NVIDIA/open-gpu-kernel-modules?tab=readme-ov-file#compatible-gpus). - Other models: Cross-check with the [compatible GPU table](https://github.com/NVIDIA/open-gpu-kernel-modules?tab=readme-ov-file#compatible-gpus). - Unknown model: Run `lspci | grep -i nvidia` to query the GPU architecture code and determine compatibility. -::: + ::: :::warning AI Performance Even if your GPU architecture is supported, **low VRAM capacity may cause AI applications to fail**. Ensure your GPU has enough memory for your workloads. @@ -27,23 +28,30 @@ Even if your GPU architecture is supported, **low VRAM capacity may cause AI app Olares supports three GPU allocation modes. Choosing the right mode helps optimize performance based on your needs. -### App Exclusive - -In this mode, the GPU’s full compute capacity and VRAM are allocated to a single application to ensure the maximized performance. - -### Memory Slicing - -In this mode, GPU VRAM is allocated to multiple applications by specified VRAM quotas: - -- Applications with assigned VRAM can run concurrently on the GPU. -- The sum of all assigned VRAMs must not exceed the GPU’s physical VRAM. - ### Time Slicing -In this mode, any number of applications can be bound to the same GPU: +In this mode, a GPU can be bound to multiple applications and rotates execution in time slices. -- At any instant, only one application fully occupies the GPU’s compute and VRAM. -- VRAM contents of other applications are temporarily swapped out to system memory. +* At any instant, only one application uses all available compute and VRAM of the GPU. +* Other apps enter a wait queue; their CUDA and VRAM content will be swapped to the system memory. + +### App Exclusive + +In this mode, the entire GPU is allocated to a single application. + +* During execution, the app can use all compute and VRAM of the bound GPU. +* No cross-app contention or scheduling overhead so that best performance is guaranteed. + +### Memory Slicing +In this mode, VRAM of the GPU is partitioned into fixed quotas for multiple designated applications. + +* Users need to manually set a quota for each app. +* The sum of quotas must not exceed physical VRAM of the bound GPU. Oversubscription is not supported. +* Apps with quota assigned can run concurrently, each limited to its own quota. + +:::tip Multi-GPU aggregation +You can bind multiple GPUs to one application within the same cluster to gain bigger VRAM. In such scenarios, only **App Exclusive** or **Memory Slicing** modes are supported. +::: ## View GPU status @@ -52,8 +60,10 @@ To view your GPU status: 1. Navigate to **Settings** > **GPU**. The GPU list shows each GPU’s model, associated node, total VRAM, and current GPU mode. 2. Click on a specific GPU to visit its details. +![GPU overview](/images/manual/olares/gpu-overview.png#bordered) + ::: tip Note -If your Olares only has one GPU, navigating to the GPU section will take you directly to the GPU details page. If you have multiple GPUs, you will see a list first. +If your Olares only has one GPU, navigating to the GPU section will take you directly to the GPU details page. ::: ## Configure GPU mode @@ -69,23 +79,19 @@ On the **GPU details** page, select your desired mode from the **GPU mode** drop :::tip Note No manual pinning is required if you only have one GPU in your cluster. ::: - + * **App Exclusive** 1. Select this mode from the GPU mode dropdown. 2. In the **Select exclusive app** dropbox, choose your target application. 3. Click **Confirm**. - ![App exclusive](/images/manual/olares/gpu-app-exclusive.png#bordered) + ![App exclusive](/images/manual/olares/gpu-app-exclusive.png#bordered) - * **Memory Slicing** - 1. Select this mode from the dropdown. - 2. In the **Allocate VRAM** section, click **Add an application**. - 3. Select your target application and assign it a specific amount of VRAM (in GB). - 4. Repeat for other applications and click **Confirm**. - ![VRAM slicing](/images/manual/olares/gpu-memory-slicing.png#bordered) - - ::: tip Note - You can't assign a VRAM that's larger than the total VRAM. - ::: +* **Memory Slicing** + 1. Select this mode from the dropdown. + 2. In the **Allocate VRAM** section, click **Add an application**. + 3. Select your target application and assign it a specific amount of VRAM in GB. + 4. Repeat for other applications and click **Confirm**. + ![VRAM slicing](/images/manual/olares/gpu-memory-slicing.png#bordered) ## Learn more - [Monitor GPU usage in Olares](../resources-usage.md) \ No newline at end of file diff --git a/docs/public/images/manual/olares/gpu-overview.png b/docs/public/images/manual/olares/gpu-overview.png new file mode 100644 index 000000000..35fcd0822 Binary files /dev/null and b/docs/public/images/manual/olares/gpu-overview.png differ diff --git a/docs/public/images/zh/manual/olares/gpu-overview.png b/docs/public/images/zh/manual/olares/gpu-overview.png new file mode 100644 index 000000000..b3058d94a Binary files /dev/null and b/docs/public/images/zh/manual/olares/gpu-overview.png differ diff --git a/docs/zh/manual/olares/settings/gpu-resource.md b/docs/zh/manual/olares/settings/gpu-resource.md index 010d9df45..ee2a2d948 100644 --- a/docs/zh/manual/olares/settings/gpu-resource.md +++ b/docs/zh/manual/olares/settings/gpu-resource.md @@ -28,27 +28,37 @@ Olares 仅支持 **NVIDIA 显卡**,且要求架构为 **Turing 或更新**(T Olares 提供三种分配方式,可按场景灵活选择。 -### 应用独占模式 - -在此模式下,单张 GPU 的算力和显存将分配给一个应用,以保证最佳性能。 - -### 显存分片模式 - -在此模式下,GPU 显存可按指定显存分配给多个应用。 -- 所有获得显存的应用可同时使用 GPU。 -- 所分配显存之和不得超过总物理显存。 - ### 时间分片模式 -在此模式下,任意数量应用可绑定至同一 GPU: -- 任一时刻仅有一个应用完全占用 GPU 算力和显存。 -- 此时其他应用的显存内容会暂时换出至系统内存。 +在此模式下,单张显卡按时间分片分配给多个应用。 +- 任一时刻仅一个应用占用全部算力与可用显存。 +- 其余应用进入等待队列,其 CUDA 及显存内容被换出至系统内存。 + +### 应用独占模式 + +在此模式下,每张显卡的计算能力和显存将分配至单个应用。 + +- 应用在运行时可使用显卡全部的算力和显存。 +- 在这个模式下运行的应用会获得最佳性能。 + +### 显存分片模式 +在此模式下,每张显卡的显存被划分为固定配额,分配给多个指定应用。 + +- 需为每个应用手动设定配额。 +- 各配额之和不得超过对应显卡的物理显存。(暂不支持超订阅) +- 获配额的应用可并行运行,且仅能使用自身配额。 + +:::tip 多显卡合并 +在同一集群中,可将多张显卡绑定至同一应用以获取更大显存和算力;合并场景下仅支持应用独占或显存分片模式。 +::: ## 查看显卡状态 1. 进入 **设置 > GPU**。GPU 列表显示每个显卡的型号、所在节点、总显存及当前分配模式。 2. 点击单个显卡以进入其详情页。 +![GPU 概览](/images/zh/manual/olares/gpu-overview.png#bordered) + ::: tip 注意 如果你的 Olares 集群中只有一块 GPU,进入 GPU 页面将直接跳转至详情页;若有多块 GPU,则会显示 GPU 列表。 :::