mirror of https://github.com/ultralytics/ultralytics synced 2026-04-21 14:07:18 +00:00

2026-01-16 19:20:23 +00:00

11 KiB

Raw Blame History

comments	description	keywords
true	Learn how to train YOLO models on cloud GPUs with Ultralytics Platform, including remote training and real-time metrics streaming.	Ultralytics Platform, cloud training, GPU training, remote training, YOLO, model training, machine learning

Cloud Training

Ultralytics Platform Cloud Training offers single-click training on cloud GPUs, making model training accessible without complex setup. Train YOLO models with real-time metrics streaming and automatic checkpoint saving.

Watch: Cloud Training with Ultralytics Platform

Train from UI

Start cloud training directly from the Platform:

Navigate to your project
Click Train Model
Configure training parameters
Click Start Training

Step 1: Select Dataset

Choose a dataset from your uploads:

Option	Description
Your Datasets	Datasets you've uploaded
Public Datasets	Shared datasets from Explore

Step 2: Configure Model

Select base model and parameters:

Parameter	Description	Default
Model	Base architecture (YOLO26n, s, m, l, x)	YOLO26n
Epochs	Number of training iterations	100
Image Size	Input resolution	640
Batch Size	Samples per iteration	Auto

Step 3: Select GPU

Choose your compute resources:

GPU	VRAM	Speed	Cost/Hour
RTX 6000 Pro	96GB	Very Fast	Free
M4 Pro (Mac)	64GB	Fast	Free
RTX 3090	24GB	Good	$0.44
RTX 4090	24GB	Fast	$0.74
L40S	48GB	Fast	$1.14
A100 40GB	40GB	Very Fast	$1.29
A100 80GB	80GB	Very Fast	$1.99
H100 80GB	80GB	Fastest	$3.99

!!! tip "GPU Selection"

- **RTX 6000 Pro** (Free): Excellent for most training jobs on Ultralytics infrastructure
- **M4 Pro** (Free): Apple Silicon option for compatible workloads
- **RTX 4090**: Best value for paid cloud training
- **A100 80GB**: Required for large batch sizes or big models
- **H100**: Maximum performance for time-sensitive training

!!! success "Free Training Tier"

The RTX 6000 Pro Ada (96GB VRAM) and M4 Pro GPUs are available at no cost, running on Ultralytics infrastructure. These are ideal for getting started and regular training jobs.

Step 4: Start Training

Click Start Training to launch your job. The Platform:

Provisions a GPU instance
Downloads your dataset
Begins training
Streams metrics in real-time

!!! success "Free Credits"

New accounts receive $5 in credits - enough for several training runs on RTX 4090. [Check your balance](../account/billing.md) in Settings > Billing.

Monitor Training

View real-time training progress:

Live Metrics

Metric	Description
Loss	Training and validation loss
mAP	Mean Average Precision
Precision	Correct positive predictions
Recall	Detected ground truths
GPU Util	GPU utilization percentage
Memory	GPU memory usage

Checkpoints

Checkpoints are saved automatically:

Every epoch: Latest weights saved
Best model: Highest mAP checkpoint preserved
Final model: Weights at training completion

Stop and Resume

Stop Training

Click Stop Training to pause your job:

Current checkpoint is saved
GPU instance is released
Credits stop being charged

Resume Training

Continue from your last checkpoint:

Navigate to the model
Click Resume Training
Confirm continuation

!!! note "Resume Limitations"

You can only resume training that was explicitly stopped. Failed training jobs may need to restart from scratch.

Remote Training

Train on your own hardware while streaming metrics to the Platform.

!!! warning "Package Version Requirement"

Platform integration requires **ultralytics>=8.4.0**. Lower versions will NOT work with Platform.

```bash
pip install "ultralytics>=8.4.0"
```

Setup API Key

Go to Settings > API Keys
Create a new key with training scope
Set the environment variable:

export ULTRALYTICS_API_KEY="your_api_key"

Train with Streaming

Use the project and name parameters to stream metrics:

=== "CLI"

```bash
yolo train model=yolo26n.pt data=coco.yaml epochs=100 \
  project=username/my-project name=experiment-1
```

=== "Python"

```python
from ultralytics import YOLO

model = YOLO("yolo26n.pt")
model.train(
    data="coco.yaml",
    epochs=100,
    project="username/my-project",
    name="experiment-1",
)
```

Using Platform Datasets

Train with datasets stored on the Platform:

yolo train model=yolo26n.pt data=ul://username/datasets/my-dataset epochs=100

The ul:// URI format automatically downloads and configures your dataset.

Billing

Training costs are based on GPU usage:

Cost Calculation

Total Cost = GPU Rate × Training Time (hours)

Example	GPU	Time	Cost
Small job	RTX 4090	1 hour	$0.74
Medium job	A100 40GB	4 hours	$5.16
Large job	H100	8 hours	$31.92

Payment Methods

Method	Description
Account Balance	Pre-loaded credits
Pay Per Job	Charge at job completion

!!! note "Minimum Balance"

A minimum balance of $5.00 is required to start epoch-based training.

View Training Costs

After training, view detailed costs in the Billing tab:

Per-epoch cost breakdown
Total GPU time
Download cost report

Training Tips

Choose the Right Model Size

Model	Parameters	Best For
YOLO26n	2.4M	Real-time, edge devices
YOLO26s	9.5M	Balanced speed/accuracy
YOLO26m	20.4M	Higher accuracy
YOLO26l	24.8M	Production accuracy
YOLO26x	55.7M	Maximum accuracy

Optimize Training Time

Start small: Test with fewer epochs first
Use appropriate GPU: Match GPU to model/batch size
Validate dataset: Ensure quality before training
Monitor early: Stop if metrics plateau

Troubleshooting

Issue	Solution
Training stuck at 0%	Check dataset format, retry
Out of memory	Reduce batch size or use larger GPU
Poor accuracy	Increase epochs, check data quality
Training slow	Consider faster GPU

FAQ

How long does training take?

Training time depends on:

Dataset size
Model size
Number of epochs
GPU selected

Typical times (1000 images, 100 epochs):

Model	RTX 4090	A100
YOLO26n	30 min	20 min
YOLO26m	60 min	40 min
YOLO26x	120 min	80 min

Can I train overnight?

Yes, training continues until completion. You'll receive a notification when training finishes. Make sure your account has sufficient balance for epoch-based training.

What happens if I run out of credits?

Training pauses at the end of the current epoch. Your checkpoint is saved, and you can resume after adding credits.

Can I use custom training arguments?

Yes, advanced users can specify additional arguments in the training configuration.

Training Parameters Reference

Core Parameters

Parameter	Type	Default	Range	Description
`epochs`	int	100	1+	Number of training epochs
`batch`	int	16	-1 = auto	Batch size (-1 for auto)
`imgsz`	int	640	32+	Input image size
`patience`	int	100	0+	Early stopping patience
`workers`	int	8	0+	Dataloader workers
`cache`	bool	False	-	Cache images (ram/disk)

Learning Rate Parameters

Parameter	Type	Default	Range	Description
`lr0`	float	0.01	0.0-1.0	Initial learning rate
`lrf`	float	0.01	0.0-1.0	Final LR factor
`momentum`	float	0.937	0.0-1.0	SGD momentum
`weight_decay`	float	0.0005	0.0-1.0	L2 regularization
`warmup_epochs`	float	3.0	0+	Warmup epochs
`cos_lr`	bool	False	-	Cosine LR scheduler

Augmentation Parameters

Parameter	Type	Default	Range	Description
`hsv_h`	float	0.015	0.0-1.0	HSV hue augmentation
`hsv_s`	float	0.7	0.0-1.0	HSV saturation
`hsv_v`	float	0.4	0.0-1.0	HSV value
`degrees`	float	0.0	-	Rotation degrees
`translate`	float	0.1	0.0-1.0	Translation fraction
`scale`	float	0.5	0.0-1.0	Scale factor
`fliplr`	float	0.5	0.0-1.0	Horizontal flip prob
`flipud`	float	0.0	0.0-1.0	Vertical flip prob
`mosaic`	float	1.0	0.0-1.0	Mosaic augmentation
`mixup`	float	0.0	0.0-1.0	Mixup augmentation
`copy_paste`	float	0.0	0.0-1.0	Copy-paste (segment)

Optimizer Selection

Value	Description
`auto`	Automatic selection (default)
`SGD`	Stochastic Gradient Descent
`Adam`	Adam optimizer
`AdamW`	Adam with weight decay

!!! tip "Task-Specific Parameters"

Some parameters only apply to specific tasks:

- **Segment**: `overlap_mask`, `mask_ratio`, `copy_paste`
- **Pose**: `pose` (loss weight), `kobj` (keypoint objectness)
- **Classify**: `dropout`, `erasing`, `auto_augment`

11 KiB Raw Blame History Unescape Escape