update reid docs with beginner-friendly arg descriptions

- Expand reid-specific training args with detailed explanations of PK
  sampling, loss weights, and when to adjust each parameter
- Mark all reid args as "ReID only" in shared macro tables
- Add "Camera ID is optional" section to dataset guide with custom
  regex example for PID-only filenames
- Update FAQ to reflect optional camid
This commit is contained in:
rick 2026-04-03 23:19:18 -05:00
parent a58e41f02b
commit 127445e137
4 changed files with 56 additions and 32 deletions

View file

@ -70,6 +70,15 @@ cam_0indexed: false # set true if camera IDs start at 0
Unlike detection or classification datasets, ReID datasets require a `gallery` field that specifies the gallery set used during evaluation. The evaluation protocol compares each query image against all gallery images to compute mAP and Rank-1 metrics.
!!! tip "Camera ID is optional"
Camera ID (camid) is only needed for the standard Market-1501 evaluation protocol, which excludes same-person-same-camera matches from evaluation. If your custom dataset doesn't have camera information in the filenames, simply use a regex with **one capture group** (person ID only) and the pipeline will work correctly — the same-camera exclusion step is automatically skipped.
```yaml
# Example: custom dataset with PID-only filenames like "0001_001.jpg"
filename_re: "(\d+)_\d+\.(?:jpg|png|bmp)" # one group = PID only, no camera ID needed
```
## Usage
To train a YOLO ReID model on a dataset, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model [Training](../../modes/train.md) page.
@ -129,4 +138,4 @@ Classification datasets organize images into class subdirectories (e.g., `cat/`,
### Can I use custom ReID datasets with YOLO?
Yes. Create a YAML config file with `path`, `train`, `val`, `gallery`, and `nc` fields pointing to your dataset. Use one of the built-in filename presets (`market1501`, `dukemtmc`, `msmt17`) via `filename_re`, or provide a custom regex pattern with two capture groups: group(1) for person ID and group(2) for camera ID. Set `cam_0indexed: true` if your camera IDs start at 0.
Yes. Create a YAML config file with `path`, `train`, `val`, `gallery`, and `nc` fields pointing to your dataset. Use one of the built-in filename presets (`market1501`, `dukemtmc`, `msmt17`) via `filename_re`, or provide a custom regex where group(1) captures the person ID. Camera ID (group 2) is **optional** — if your regex only has one capture group, the pipeline works without camera information. If you do include camera IDs, set `cam_0indexed: true` if they start at 0.

View file

@ -62,23 +62,36 @@ Train a YOLO26n-reid model on the Market-1501 dataset for 60 epochs at image siz
yolo reid train data=Market-1501.yaml model=yolo26n-reid.yaml pretrained=yolo26n-cls.pt epochs=60 imgsz=256
```
### ReID-specific training arguments
### ReID-Specific Training Arguments
| Argument | Default | Description |
| ---------------- | ------- | --------------------------------------------------------------- |
| `reid_p` | `16` | Number of identities per batch (P in PK sampling) |
| `reid_k` | `4` | Number of images per identity (K in PK sampling) |
| `triplet_margin` | `0.3` | Margin for batch-hard triplet loss |
| `triplet_weight` | `1.0` | Weight for triplet loss |
| `ce_weight` | `1.0` | Weight for cross-entropy identity classification loss |
| `center_weight` | `0.0` | Weight for center loss (0 = disabled) |
| `center_momentum`| `0.9` | EMA momentum for center loss class centers |
| `focal_gamma` | `0.0` | Focal loss gamma for ReID CE loss (0 = standard CE) |
| `supcon_temp` | `0.0` | Supervised contrastive loss temperature (0 = use triplet loss) |
These arguments are **only available for the `reid` task** and are not part of the general YOLO configuration. You can pass them via Python (`model.train(reid_p=16)`) or CLI (`yolo reid train reid_p=16 ...`).
#### Batch Sampling
ReID training uses **PK sampling** instead of random batching. Each training batch is built by selecting `P` random person identities and then sampling `K` images for each identity. This guarantees every batch contains multiple images of the same person, which is required for the triplet loss to find meaningful positive/negative pairs.
| Argument | Type | Default | Description |
| --------- | ----- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `reid_p` | `int` | `16` | **P** — number of different person identities in each batch. The actual batch size equals `reid_p × reid_k` (e.g., 16 × 4 = 64 images). |
| `reid_k` | `int` | `4` | **K** — number of images sampled per identity in each batch. Higher values give the triplet loss more same-person pairs to compare, improving hard-negative mining. |
!!! tip
The effective batch size is `reid_p * reid_k`. For better hard-negative mining, use larger `reid_k` values (e.g., `reid_k=8` with `reid_p=32` for batch size 256).
The effective batch size is `reid_p × reid_k`. For better training, increase `reid_k` first (e.g., `reid_k=8` with `reid_p=32` for batch size 256). Make sure your GPU has enough memory for the resulting batch.
#### Loss Weights
ReID training combines multiple loss functions. The two main losses are **cross-entropy** (CE, for identity classification) and **triplet** (for metric learning). You can optionally enable center loss or supervised contrastive loss. Most users should keep the defaults.
| Argument | Type | Default | Description |
| ----------------- | ------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `ce_weight` | `float` | `1.0` | Weight of the **cross-entropy loss**. This loss teaches the model to classify each person's identity during training. Higher values make the model focus more on identity classification. |
| `triplet_weight` | `float` | `1.0` | Weight of the **triplet loss**. This loss pulls same-person embeddings closer and pushes different-person embeddings apart. It is the core metric-learning objective. |
| `triplet_margin` | `float` | `0.3` | Margin for the triplet loss. The model learns to keep the distance between different-person embeddings at least this much larger than same-person distances. Typical values: 0.20.5. |
| `center_weight` | `float` | `0.0` | Weight of the **center loss** (disabled by default). When enabled (> 0), this loss pulls each person's embeddings toward a learned class center, reducing intra-class variation. Try `0.0005` if enabling. |
| `center_momentum` | `float` | `0.9` | How fast the class centers update when center loss is enabled. Value of 0.9 means centers are updated slowly using exponential moving average. Only used when `center_weight > 0`. |
| `focal_gamma` | `float` | `0.0` | Focal loss gamma for the cross-entropy component (disabled by default). When > 0, down-weights easy-to-classify samples so the model focuses on hard examples. Try `2.0` if you have many easy identities. |
| `supcon_temp` | `float` | `0.0` | Temperature for **supervised contrastive loss** (disabled by default). When > 0, replaces the triplet loss with SupCon loss which uses all positive/negative pairs rather than just the hardest. Try `0.07` if enabling. |
### Dataset format
@ -156,12 +169,14 @@ For best results, combine both TTA and re-ranking:
yolo reid val model=path/to/best.pt reid_tta=True reid_reranking=True
```
### ReID evaluation arguments
### ReID Evaluation Arguments
| Argument | Default | Description |
| ---------------- | ------- | ---------------------------------------------------------------------------- |
| `reid_tta` | `False` | Enable horizontal flip TTA (+1-2% mAP, 2x inference time) |
| `reid_reranking` | `False` | Enable k-reciprocal re-ranking (+15-17% mAP, increases eval time) |
These arguments are **only available for `reid` validation** and improve accuracy without any retraining.
| Argument | Type | Default | Description |
| ----------------- | ------ | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `reid_tta` | `bool` | `False` | **Test-Time Augmentation**. When enabled, the model processes both the original image and a horizontally-flipped copy, then averages the two embeddings. This makes the embedding more robust and typically adds +12% mAP. Trade-off: doubles inference time. |
| `reid_reranking` | `bool` | `False` | **K-reciprocal re-ranking**. A post-processing step that refines the distance ranking by checking whether two images are mutual nearest neighbors. Can boost mAP by +1517% with no retraining. Trade-off: increases evaluation time due to extra computation. |
## Predict

View file

@ -45,17 +45,17 @@
| `kobj` | `float` | `1.0` | Weight of the keypoint objectness loss in pose estimation models, balancing detection confidence with pose accuracy. |
| `rle` | `float` | `1.0` | Weight of the residual log-likelihood estimation loss in pose estimation models, affecting the precision of keypoint localization. |
| `angle` | `float` | `1.0` | Weight of the angle loss in obb models, affecting the precision of oriented bounding box angle predictions. |
| `reid_p` | `int` | `16` | Number of identities per batch for PK sampling in ReID training. Effective batch size is `reid_p * reid_k`. |
| `reid_k` | `int` | `4` | Number of images per identity for PK sampling in ReID training. Higher values improve hard-negative mining. |
| `triplet_margin` | `float` | `0.3` | Margin for batch-hard triplet loss in ReID training. Controls minimum distance between positive and negative pairs. |
| `triplet_weight` | `float` | `1.0` | Weight of the triplet loss component in ReID training. |
| `ce_weight` | `float` | `1.0` | Weight of the cross-entropy identity classification loss in ReID training. |
| `center_weight` | `float` | `0.0` | Weight of the center loss in ReID training (0 disables). Pulls embeddings toward their class center. |
| `center_momentum` | `float` | `0.9` | EMA momentum for updating class centers in ReID center loss. |
| `focal_gamma` | `float` | `0.0` | Focal loss gamma for ReID cross-entropy loss (0 uses standard CE). Higher values focus on hard examples. |
| `supcon_temp` | `float` | `0.0` | Temperature for supervised contrastive loss in ReID (0 uses triplet loss instead). |
| `reid_reranking` | `bool` | `False` | Enables [k-reciprocal re-ranking](https://arxiv.org/abs/1701.08398) at evaluation, which refines distance rankings using neighborhood structure for significantly improved mAP (+15-17%). |
| `reid_tta` | `bool` | `False` | Enables horizontal flip test-time augmentation (TTA) at evaluation, averaging embeddings from original and flipped images for improved accuracy (+1-2% mAP). |
| `reid_p` | `int` | `16` | **ReID only.** Number of different person identities per training batch (P in PK sampling). The actual batch size equals `reid_p × reid_k`. Increase for more diverse batches if GPU memory allows. |
| `reid_k` | `int` | `4` | **ReID only.** Number of images sampled per identity in each batch (K in PK sampling). Higher values (e.g., 8) give the triplet loss more same-person pairs, improving metric learning quality. |
| `triplet_margin` | `float` | `0.3` | **ReID only.** Margin for the batch-hard triplet loss. The model learns to keep different-person embedding distances at least this much larger than same-person distances. Typical range: 0.20.5. |
| `triplet_weight` | `float` | `1.0` | **ReID only.** How much the triplet loss (metric learning) contributes to total training loss. The triplet loss pulls same-person embeddings closer and pushes different-person embeddings apart. |
| `ce_weight` | `float` | `1.0` | **ReID only.** How much the cross-entropy loss (identity classification) contributes to total training loss. This loss teaches the model to correctly classify person identities during training. |
| `center_weight` | `float` | `0.0` | **ReID only.** Weight of center loss (disabled by default). When > 0, pulls each person's embeddings toward a learned class center to reduce variation. Try `0.0005` to enable. |
| `center_momentum` | `float` | `0.9` | **ReID only.** How fast class centers update when center loss is enabled. 0.9 = slow updates via exponential moving average. Only relevant when `center_weight > 0`. |
| `focal_gamma` | `float` | `0.0` | **ReID only.** Focal loss gamma for cross-entropy (disabled by default). When > 0, down-weights easy samples so the model focuses on hard-to-classify identities. Try `2.0` to enable. |
| `supcon_temp` | `float` | `0.0` | **ReID only.** Temperature for supervised contrastive loss (disabled by default). When > 0, replaces triplet loss with SupCon which considers all positive/negative pairs, not just the hardest. Try `0.07` to enable. |
| `reid_reranking` | `bool` | `False` | **ReID only.** Enables [k-reciprocal re-ranking](https://arxiv.org/abs/1701.08398) at evaluation. Refines distance rankings by checking mutual nearest neighbors, boosting mAP by +1517% with no retraining. Increases evaluation time. |
| `reid_tta` | `bool` | `False` | **ReID only.** Enables horizontal flip test-time augmentation at evaluation. Averages embeddings from original and flipped images for +12% mAP. Doubles inference time. |
| `nbs` | `int` | `64` | Nominal batch size for normalization of loss. |
| `overlap_mask` | `bool` | `True` | Determines whether object masks should be merged into a single mask for training, or kept separate for each object. In case of overlap, the smaller mask is overlaid on top of the larger mask during merge. |
| `mask_ratio` | `int` | `4` | Downsample ratio for segmentation masks, affecting the resolution of masks used during training. |

View file

@ -26,5 +26,5 @@
| `visualize` | `bool` | `False` | Visualizes the ground truths, true positives, false positives, and false negatives for each image. Useful for debugging and model interpretation. |
| `compile` | `bool` or `str` | `False` | Enables PyTorch 2.x `torch.compile` graph compilation with `backend='inductor'`. Accepts `True``"default"`, `False` → disables, or a string mode such as `"default"`, `"reduce-overhead"`, `"max-autotune-no-cudagraphs"`. Falls back to eager with a warning if unsupported. |
| `end2end` | `bool` | `None` | Overrides the end-to-end mode in YOLO models that support NMS-free inference (YOLO26, YOLOv10). Setting it to `False` lets you run validation using the traditional NMS pipeline, additionally allowing you to make use of the `iou` argument. |
| `reid_reranking` | `bool` | `False` | Enables [k-reciprocal re-ranking](https://arxiv.org/abs/1701.08398) during ReID evaluation. Refines distance rankings using k-reciprocal nearest neighbor structure for significantly improved mAP (+15-17%). Increases evaluation time. |
| `reid_tta` | `bool` | `False` | Enables horizontal flip test-time augmentation during ReID evaluation. Averages embeddings from original and horizontally-flipped images for improved accuracy (+1-2% mAP). Doubles inference time. |
| `reid_reranking` | `bool` | `False` | **ReID only.** Enables [k-reciprocal re-ranking](https://arxiv.org/abs/1701.08398) during evaluation. A post-processing step that checks whether two images are mutual nearest neighbors to refine rankings. Boosts mAP by +1517% with no retraining needed, but increases evaluation time. |
| `reid_tta` | `bool` | `False` | **ReID only.** Enables horizontal flip test-time augmentation during evaluation. The model processes both the original and a flipped copy of each image and averages the two embeddings, giving more robust results. Adds +12% mAP but doubles inference time. |