update reid docs with beginner-friendly arg descriptions

- Expand reid-specific training args with detailed explanations of PK sampling, loss weights, and when to adjust each parameter - Mark all reid args as "ReID only" in shared macro tables - Add "Camera ID is optional" section to dataset guide with custom regex example for PID-only filenames - Update FAQ to reflect optional camid
2026-04-21 14:07:18 +00:00 · 2026-04-03 23:19:18 -05:00 · 2026-04-03 23:19:18 -05:00 · 127445e137
commit 127445e137
parent a58e41f02b
4 changed files with 56 additions and 32 deletions
--- a/docs/en/datasets/reid/index.md
+++ b/docs/en/datasets/reid/index.md
@ -70,6 +70,15 @@ cam_0indexed: false  # set true if camera IDs start at 0

    Unlike detection or classification datasets, ReID datasets require a `gallery` field that specifies the gallery set used during evaluation. The evaluation protocol compares each query image against all gallery images to compute mAP and Rank-1 metrics.

+!!! tip "Camera ID is optional"
+
+    Camera ID (camid) is only needed for the standard Market-1501 evaluation protocol, which excludes same-person-same-camera matches from evaluation. If your custom dataset doesn't have camera information in the filenames, simply use a regex with **one capture group** (person ID only) and the pipeline will work correctly — the same-camera exclusion step is automatically skipped.
+
+    ```yaml
+    # Example: custom dataset with PID-only filenames like "0001_001.jpg"
+    filename_re: "(\d+)_\d+\.(?:jpg|png|bmp)"  # one group = PID only, no camera ID needed
+    ```
+
 ## Usage

 To train a YOLO ReID model on a dataset, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page.
@ -129,4 +138,4 @@ Classification datasets organize images into class subdirectories (e.g., `cat/`,

 ### Can I use custom ReID datasets with YOLO?

-Yes. Create a YAML config file with `path`, `train`, `val`, `gallery`, and `nc` fields pointing to your dataset. Use one of the built-in filename presets (`market1501`, `dukemtmc`, `msmt17`) via `filename_re`, or provide a custom regex pattern with two capture groups: group(1) for person ID and group(2) for camera ID. Set `cam_0indexed: true` if your camera IDs start at 0.
+Yes. Create a YAML config file with `path`, `train`, `val`, `gallery`, and `nc` fields pointing to your dataset. Use one of the built-in filename presets (`market1501`, `dukemtmc`, `msmt17`) via `filename_re`, or provide a custom regex where group(1) captures the person ID. Camera ID (group 2) is **optional** — if your regex only has one capture group, the pipeline works without camera information. If you do include camera IDs, set `cam_0indexed: true` if they start at 0.
--- a/docs/en/tasks/reid.md
+++ b/docs/en/tasks/reid.md
@ -62,23 +62,36 @@ Train a YOLO26n-reid model on the Market-1501 dataset for 60 epochs at image siz
        yolo reid train data=Market-1501.yaml model=yolo26n-reid.yaml pretrained=yolo26n-cls.pt epochs=60 imgsz=256
        ```

-### ReID-specific training arguments
+### ReID-Specific Training Arguments

-| Argument         | Default | Description                                                     |
-| ---------------- | ------- | --------------------------------------------------------------- |
-| `reid_p`         | `16`    | Number of identities per batch (P in PK sampling)               |
-| `reid_k`         | `4`     | Number of images per identity (K in PK sampling)                |
-| `triplet_margin` | `0.3`   | Margin for batch-hard triplet loss                              |
-| `triplet_weight` | `1.0`   | Weight for triplet loss                                         |
-| `ce_weight`      | `1.0`   | Weight for cross-entropy identity classification loss           |
-| `center_weight`  | `0.0`   | Weight for center loss (0 = disabled)                           |
-| `center_momentum`| `0.9`   | EMA momentum for center loss class centers                      |
-| `focal_gamma`    | `0.0`   | Focal loss gamma for ReID CE loss (0 = standard CE)             |
-| `supcon_temp`    | `0.0`   | Supervised contrastive loss temperature (0 = use triplet loss)  |
+These arguments are **only available for the `reid` task** and are not part of the general YOLO configuration. You can pass them via Python (`model.train(reid_p=16)`) or CLI (`yolo reid train reid_p=16 ...`).
+
+#### Batch Sampling
+
+ReID training uses **PK sampling** instead of random batching. Each training batch is built by selecting `P` random person identities and then sampling `K` images for each identity. This guarantees every batch contains multiple images of the same person, which is required for the triplet loss to find meaningful positive/negative pairs.
+
+| Argument  | Type  | Default | Description                                                                                                                                                        |
+| --------- | ----- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| `reid_p`  | `int` | `16`    | **P** — number of different person identities in each batch. The actual batch size equals `reid_p × reid_k` (e.g., 16 × 4 = 64 images).                           |
+| `reid_k`  | `int` | `4`     | **K** — number of images sampled per identity in each batch. Higher values give the triplet loss more same-person pairs to compare, improving hard-negative mining. |

 !!! tip

-    The effective batch size is `reid_p * reid_k`. For better hard-negative mining, use larger `reid_k` values (e.g., `reid_k=8` with `reid_p=32` for batch size 256).
+    The effective batch size is `reid_p × reid_k`. For better training, increase `reid_k` first (e.g., `reid_k=8` with `reid_p=32` for batch size 256). Make sure your GPU has enough memory for the resulting batch.
+
+#### Loss Weights
+
+ReID training combines multiple loss functions. The two main losses are **cross-entropy** (CE, for identity classification) and **triplet** (for metric learning). You can optionally enable center loss or supervised contrastive loss. Most users should keep the defaults.
+
+| Argument          | Type    | Default | Description                                                                                                                                                                                                                       |
+| ----------------- | ------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `ce_weight`       | `float` | `1.0`   | Weight of the **cross-entropy loss**. This loss teaches the model to classify each person's identity during training. Higher values make the model focus more on identity classification.                                          |
+| `triplet_weight`  | `float` | `1.0`   | Weight of the **triplet loss**. This loss pulls same-person embeddings closer and pushes different-person embeddings apart. It is the core metric-learning objective.                                                              |
+| `triplet_margin`  | `float` | `0.3`   | Margin for the triplet loss. The model learns to keep the distance between different-person embeddings at least this much larger than same-person distances. Typical values: 0.2–0.5.                                             |
+| `center_weight`   | `float` | `0.0`   | Weight of the **center loss** (disabled by default). When enabled (> 0), this loss pulls each person's embeddings toward a learned class center, reducing intra-class variation. Try `0.0005` if enabling.                        |
+| `center_momentum` | `float` | `0.9`   | How fast the class centers update when center loss is enabled. Value of 0.9 means centers are updated slowly using exponential moving average. Only used when `center_weight > 0`.                                                |
+| `focal_gamma`     | `float` | `0.0`   | Focal loss gamma for the cross-entropy component (disabled by default). When > 0, down-weights easy-to-classify samples so the model focuses on hard examples. Try `2.0` if you have many easy identities.                       |
+| `supcon_temp`     | `float` | `0.0`   | Temperature for **supervised contrastive loss** (disabled by default). When > 0, replaces the triplet loss with SupCon loss which uses all positive/negative pairs rather than just the hardest. Try `0.07` if enabling.          |

 ### Dataset format

@ -156,12 +169,14 @@ For best results, combine both TTA and re-ranking:
        yolo reid val model=path/to/best.pt reid_tta=True reid_reranking=True
        ```

-### ReID evaluation arguments
+### ReID Evaluation Arguments

-| Argument         | Default | Description                                                                  |
-| ---------------- | ------- | ---------------------------------------------------------------------------- |
-| `reid_tta`       | `False` | Enable horizontal flip TTA (+1-2% mAP, 2x inference time)                   |
-| `reid_reranking` | `False` | Enable k-reciprocal re-ranking (+15-17% mAP, increases eval time)            |
+These arguments are **only available for `reid` validation** and improve accuracy without any retraining.
+
+| Argument          | Type   | Default | Description                                                                                                                                                                                                                                                     |
+| ----------------- | ------ | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `reid_tta`        | `bool` | `False` | **Test-Time Augmentation**. When enabled, the model processes both the original image and a horizontally-flipped copy, then averages the two embeddings. This makes the embedding more robust and typically adds +1–2% mAP. Trade-off: doubles inference time.   |
+| `reid_reranking`  | `bool` | `False` | **K-reciprocal re-ranking**. A post-processing step that refines the distance ranking by checking whether two images are mutual nearest neighbors. Can boost mAP by +15–17% with no retraining. Trade-off: increases evaluation time due to extra computation.   |

 ## Predict

--- a/docs/macros/train-args.md
+++ b/docs/macros/train-args.md
@ -45,17 +45,17 @@
 | `kobj`            | `float`                  | `1.0`    | Weight of the keypoint objectness loss in pose estimation models, balancing detection confidence with pose accuracy.                                                                                                                                                                                                   |
 | `rle`             | `float`                  | `1.0`    | Weight of the residual log-likelihood estimation loss in pose estimation models, affecting the precision of keypoint localization.                                                                                                                                                                                     |
 | `angle`           | `float`                  | `1.0`    | Weight of the angle loss in obb models, affecting the precision of oriented bounding box angle predictions.                                                                                                                                                                                                            |
-| `reid_p`          | `int`                    | `16`     | Number of identities per batch for PK sampling in ReID training. Effective batch size is `reid_p * reid_k`.                                                                                                                                                                                                            |
-| `reid_k`          | `int`                    | `4`      | Number of images per identity for PK sampling in ReID training. Higher values improve hard-negative mining.                                                                                                                                                                                                            |
-| `triplet_margin`  | `float`                  | `0.3`    | Margin for batch-hard triplet loss in ReID training. Controls minimum distance between positive and negative pairs.                                                                                                                                                                                                    |
-| `triplet_weight`  | `float`                  | `1.0`    | Weight of the triplet loss component in ReID training.                                                                                                                                                                                                                                                                 |
-| `ce_weight`       | `float`                  | `1.0`    | Weight of the cross-entropy identity classification loss in ReID training.                                                                                                                                                                                                                                             |
-| `center_weight`   | `float`                  | `0.0`    | Weight of the center loss in ReID training (0 disables). Pulls embeddings toward their class center.                                                                                                                                                                                                                   |
-| `center_momentum` | `float`                  | `0.9`    | EMA momentum for updating class centers in ReID center loss.                                                                                                                                                                                                                                                           |
-| `focal_gamma`     | `float`                  | `0.0`    | Focal loss gamma for ReID cross-entropy loss (0 uses standard CE). Higher values focus on hard examples.                                                                                                                                                                                                               |
-| `supcon_temp`     | `float`                  | `0.0`    | Temperature for supervised contrastive loss in ReID (0 uses triplet loss instead).                                                                                                                                                                                                                                     |
-| `reid_reranking`  | `bool`                   | `False`  | Enables [k-reciprocal re-ranking](https://arxiv.org/abs/1701.08398) at evaluation, which refines distance rankings using neighborhood structure for significantly improved mAP (+15-17%).                                                                                                                              |
-| `reid_tta`        | `bool`                   | `False`  | Enables horizontal flip test-time augmentation (TTA) at evaluation, averaging embeddings from original and flipped images for improved accuracy (+1-2% mAP).                                                                                                                                                           |
+| `reid_p`          | `int`                    | `16`     | **ReID only.** Number of different person identities per training batch (P in PK sampling). The actual batch size equals `reid_p × reid_k`. Increase for more diverse batches if GPU memory allows.                                                                                                                    |
+| `reid_k`          | `int`                    | `4`      | **ReID only.** Number of images sampled per identity in each batch (K in PK sampling). Higher values (e.g., 8) give the triplet loss more same-person pairs, improving metric learning quality.                                                                                                                        |
+| `triplet_margin`  | `float`                  | `0.3`    | **ReID only.** Margin for the batch-hard triplet loss. The model learns to keep different-person embedding distances at least this much larger than same-person distances. Typical range: 0.2–0.5.                                                                                                                     |
+| `triplet_weight`  | `float`                  | `1.0`    | **ReID only.** How much the triplet loss (metric learning) contributes to total training loss. The triplet loss pulls same-person embeddings closer and pushes different-person embeddings apart.                                                                                                                       |
+| `ce_weight`       | `float`                  | `1.0`    | **ReID only.** How much the cross-entropy loss (identity classification) contributes to total training loss. This loss teaches the model to correctly classify person identities during training.                                                                                                                       |
+| `center_weight`   | `float`                  | `0.0`    | **ReID only.** Weight of center loss (disabled by default). When > 0, pulls each person's embeddings toward a learned class center to reduce variation. Try `0.0005` to enable.                                                                                                                                        |
+| `center_momentum` | `float`                  | `0.9`    | **ReID only.** How fast class centers update when center loss is enabled. 0.9 = slow updates via exponential moving average. Only relevant when `center_weight > 0`.                                                                                                                                                   |
+| `focal_gamma`     | `float`                  | `0.0`    | **ReID only.** Focal loss gamma for cross-entropy (disabled by default). When > 0, down-weights easy samples so the model focuses on hard-to-classify identities. Try `2.0` to enable.                                                                                                                                |
+| `supcon_temp`     | `float`                  | `0.0`    | **ReID only.** Temperature for supervised contrastive loss (disabled by default). When > 0, replaces triplet loss with SupCon which considers all positive/negative pairs, not just the hardest. Try `0.07` to enable.                                                                                                 |
+| `reid_reranking`  | `bool`                   | `False`  | **ReID only.** Enables [k-reciprocal re-ranking](https://arxiv.org/abs/1701.08398) at evaluation. Refines distance rankings by checking mutual nearest neighbors, boosting mAP by +15–17% with no retraining. Increases evaluation time.                                                                              |
+| `reid_tta`        | `bool`                   | `False`  | **ReID only.** Enables horizontal flip test-time augmentation at evaluation. Averages embeddings from original and flipped images for +1–2% mAP. Doubles inference time.                                                                                                                                               |
 | `nbs`             | `int`                    | `64`     | Nominal batch size for normalization of loss.                                                                                                                                                                                                                                                                          |
 | `overlap_mask`    | `bool`                   | `True`   | Determines whether object masks should be merged into a single mask for training, or kept separate for each object. In case of overlap, the smaller mask is overlaid on top of the larger mask during merge.                                                                                                           |
 | `mask_ratio`      | `int`                    | `4`      | Downsample ratio for segmentation masks, affecting the resolution of masks used during training.                                                                                                                                                                                                                       |
--- a/docs/macros/validation-args.md
+++ b/docs/macros/validation-args.md
@ -26,5 +26,5 @@
 | `visualize`    | `bool`          | `False` | Visualizes the ground truths, true positives, false positives, and false negatives for each image. Useful for debugging and model interpretation.                                                                                                                                                                                                                                                                                                            |
 | `compile`      | `bool` or `str` | `False` | Enables PyTorch 2.x `torch.compile` graph compilation with `backend='inductor'`. Accepts `True` → `"default"`, `False` → disables, or a string mode such as `"default"`, `"reduce-overhead"`, `"max-autotune-no-cudagraphs"`. Falls back to eager with a warning if unsupported.                                                                                                                                                                             |
 | `end2end`      | `bool`          | `None`  | Overrides the end-to-end mode in YOLO models that support NMS-free inference (YOLO26, YOLOv10). Setting it to `False` lets you run validation using the traditional NMS pipeline, additionally allowing you to make use of the `iou` argument.                                                                                                                                                                                                               |
-| `reid_reranking` | `bool`        | `False` | Enables [k-reciprocal re-ranking](https://arxiv.org/abs/1701.08398) during ReID evaluation. Refines distance rankings using k-reciprocal nearest neighbor structure for significantly improved mAP (+15-17%). Increases evaluation time.                                                                                                                                                                                                                     |
-| `reid_tta`     | `bool`          | `False` | Enables horizontal flip test-time augmentation during ReID evaluation. Averages embeddings from original and horizontally-flipped images for improved accuracy (+1-2% mAP). Doubles inference time.                                                                                                                                                                                                                                                          |
+| `reid_reranking` | `bool`        | `False` | **ReID only.** Enables [k-reciprocal re-ranking](https://arxiv.org/abs/1701.08398) during evaluation. A post-processing step that checks whether two images are mutual nearest neighbors to refine rankings. Boosts mAP by +15–17% with no retraining needed, but increases evaluation time.                                                                                                                                                                |
+| `reid_tta`     | `bool`          | `False` | **ReID only.** Enables horizontal flip test-time augmentation during evaluation. The model processes both the original and a flipped copy of each image and averages the two embeddings, giving more robust results. Adds +1–2% mAP but doubles inference time.                                                                                                                                                                                               |