mirror of
https://github.com/beclab/Olares
synced 2026-05-24 09:18:23 +00:00
add multiple cards for one app support and update GPU modes description
This commit is contained in:
parent
5a434b5b50
commit
d25bde12c3
4 changed files with 58 additions and 42 deletions
|
|
@ -12,12 +12,13 @@ Olares allows you to harness the full power of your GPUs to accelerate demanding
|
|||
This guide helps you understand and configure GPU allocation modes to maximize hardware performance.
|
||||
|
||||
::: tip GPU support
|
||||
Olares supports **only Nvidia GPUs** of **Turing architecture or later** (Turing, Ampere, Ada Lovelace, and Blackwell).
|
||||
Olares supports **only Nvidia GPUs** of **Turing architecture or later** (Turing, Ampere, Ada Lovelace, and Blackwell).
|
||||
|
||||
- Quick check: GTX/RTX **16 series and newer** consumer cards are supported.
|
||||
- For other models, cross-check with the [compatible GPU table](https://github.com/NVIDIA/open-gpu-kernel-modules?tab=readme-ov-file#compatible-gpus).
|
||||
- Other models: Cross-check with the [compatible GPU table](https://github.com/NVIDIA/open-gpu-kernel-modules?tab=readme-ov-file#compatible-gpus).
|
||||
- Unknown model: Run `lspci | grep -i nvidia` to query the GPU architecture code and determine compatibility.
|
||||
:::
|
||||
:::
|
||||
|
||||
:::warning AI Performance
|
||||
Even if your GPU architecture is supported, **low VRAM capacity may cause AI applications to fail**. Ensure your GPU has enough memory for your workloads.
|
||||
|
|
@ -27,23 +28,30 @@ Even if your GPU architecture is supported, **low VRAM capacity may cause AI app
|
|||
|
||||
Olares supports three GPU allocation modes. Choosing the right mode helps optimize performance based on your needs.
|
||||
|
||||
### App Exclusive
|
||||
|
||||
In this mode, the GPU’s full compute capacity and VRAM are allocated to a single application to ensure the maximized performance.
|
||||
|
||||
### Memory Slicing
|
||||
|
||||
In this mode, GPU VRAM is allocated to multiple applications by specified VRAM quotas:
|
||||
|
||||
- Applications with assigned VRAM can run concurrently on the GPU.
|
||||
- The sum of all assigned VRAMs must not exceed the GPU’s physical VRAM.
|
||||
|
||||
### Time Slicing
|
||||
|
||||
In this mode, any number of applications can be bound to the same GPU:
|
||||
In this mode, a GPU can be bound to multiple applications and rotates execution in time slices.
|
||||
|
||||
- At any instant, only one application fully occupies the GPU’s compute and VRAM.
|
||||
- VRAM contents of other applications are temporarily swapped out to system memory.
|
||||
* At any instant, only one application uses all available compute and VRAM of the GPU.
|
||||
* Other apps enter a wait queue; their CUDA and VRAM content will be swapped to the system memory.
|
||||
|
||||
### App Exclusive
|
||||
|
||||
In this mode, the entire GPU is allocated to a single application.
|
||||
|
||||
* During execution, the app can use all compute and VRAM of the bound GPU.
|
||||
* No cross-app contention or scheduling overhead so that best performance is guaranteed.
|
||||
|
||||
### Memory Slicing
|
||||
In this mode, VRAM of the GPU is partitioned into fixed quotas for multiple designated applications.
|
||||
|
||||
* Users need to manually set a quota for each app.
|
||||
* The sum of quotas must not exceed physical VRAM of the bound GPU. Oversubscription is not supported.
|
||||
* Apps with quota assigned can run concurrently, each limited to its own quota.
|
||||
|
||||
:::tip Multi-GPU aggregation
|
||||
You can bind multiple GPUs to one application within the same cluster to gain bigger VRAM. In such scenarios, only **App Exclusive** or **Memory Slicing** modes are supported.
|
||||
:::
|
||||
|
||||
## View GPU status
|
||||
|
||||
|
|
@ -52,8 +60,10 @@ To view your GPU status:
|
|||
1. Navigate to **Settings** > **GPU**. The GPU list shows each GPU’s model, associated node, total VRAM, and current GPU mode.
|
||||
2. Click on a specific GPU to visit its details.
|
||||
|
||||

|
||||
|
||||
::: tip Note
|
||||
If your Olares only has one GPU, navigating to the GPU section will take you directly to the GPU details page. If you have multiple GPUs, you will see a list first.
|
||||
If your Olares only has one GPU, navigating to the GPU section will take you directly to the GPU details page.
|
||||
:::
|
||||
|
||||
## Configure GPU mode
|
||||
|
|
@ -69,23 +79,19 @@ On the **GPU details** page, select your desired mode from the **GPU mode** drop
|
|||
:::tip Note
|
||||
No manual pinning is required if you only have one GPU in your cluster.
|
||||
:::
|
||||
|
||||
|
||||
* **App Exclusive**
|
||||
1. Select this mode from the GPU mode dropdown.
|
||||
2. In the **Select exclusive app** dropbox, choose your target application.
|
||||
3. Click **Confirm**.
|
||||

|
||||

|
||||
|
||||
* **Memory Slicing**
|
||||
1. Select this mode from the dropdown.
|
||||
2. In the **Allocate VRAM** section, click **Add an application**.
|
||||
3. Select your target application and assign it a specific amount of VRAM (in GB).
|
||||
4. Repeat for other applications and click **Confirm**.
|
||||

|
||||
|
||||
::: tip Note
|
||||
You can't assign a VRAM that's larger than the total VRAM.
|
||||
:::
|
||||
* **Memory Slicing**
|
||||
1. Select this mode from the dropdown.
|
||||
2. In the **Allocate VRAM** section, click **Add an application**.
|
||||
3. Select your target application and assign it a specific amount of VRAM in GB.
|
||||
4. Repeat for other applications and click **Confirm**.
|
||||

|
||||
|
||||
## Learn more
|
||||
- [Monitor GPU usage in Olares](../resources-usage.md)
|
||||
BIN
docs/public/images/manual/olares/gpu-overview.png
Normal file
BIN
docs/public/images/manual/olares/gpu-overview.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 61 KiB |
BIN
docs/public/images/zh/manual/olares/gpu-overview.png
Normal file
BIN
docs/public/images/zh/manual/olares/gpu-overview.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 57 KiB |
|
|
@ -28,27 +28,37 @@ Olares 仅支持 **NVIDIA 显卡**,且要求架构为 **Turing 或更新**(T
|
|||
|
||||
Olares 提供三种分配方式,可按场景灵活选择。
|
||||
|
||||
### 应用独占模式
|
||||
|
||||
在此模式下,单张 GPU 的算力和显存将分配给一个应用,以保证最佳性能。
|
||||
|
||||
### 显存分片模式
|
||||
|
||||
在此模式下,GPU 显存可按指定显存分配给多个应用。
|
||||
- 所有获得显存的应用可同时使用 GPU。
|
||||
- 所分配显存之和不得超过总物理显存。
|
||||
|
||||
### 时间分片模式
|
||||
|
||||
在此模式下,任意数量应用可绑定至同一 GPU:
|
||||
- 任一时刻仅有一个应用完全占用 GPU 算力和显存。
|
||||
- 此时其他应用的显存内容会暂时换出至系统内存。
|
||||
在此模式下,单张显卡按时间分片分配给多个应用。
|
||||
- 任一时刻仅一个应用占用全部算力与可用显存。
|
||||
- 其余应用进入等待队列,其 CUDA 及显存内容被换出至系统内存。
|
||||
|
||||
### 应用独占模式
|
||||
|
||||
在此模式下,每张显卡的计算能力和显存将分配至单个应用。
|
||||
|
||||
- 应用在运行时可使用显卡全部的算力和显存。
|
||||
- 在这个模式下运行的应用会获得最佳性能。
|
||||
|
||||
### 显存分片模式
|
||||
在此模式下,每张显卡的显存被划分为固定配额,分配给多个指定应用。
|
||||
|
||||
- 需为每个应用手动设定配额。
|
||||
- 各配额之和不得超过对应显卡的物理显存。(暂不支持超订阅)
|
||||
- 获配额的应用可并行运行,且仅能使用自身配额。
|
||||
|
||||
:::tip 多显卡合并
|
||||
在同一集群中,可将多张显卡绑定至同一应用以获取更大显存和算力;合并场景下仅支持应用独占或显存分片模式。
|
||||
:::
|
||||
|
||||
## 查看显卡状态
|
||||
|
||||
1. 进入 **设置 > GPU**。GPU 列表显示每个显卡的型号、所在节点、总显存及当前分配模式。
|
||||
2. 点击单个显卡以进入其详情页。
|
||||
|
||||

|
||||
|
||||
::: tip 注意
|
||||
如果你的 Olares 集群中只有一块 GPU,进入 GPU 页面将直接跳转至详情页;若有多块 GPU,则会显示 GPU 列表。
|
||||
:::
|
||||
|
|
|
|||
Loading…
Reference in a new issue