Merge branch 'main' into deepwiki-badge

This commit is contained in:
Jeffrey Lai 2026-03-10 09:14:59 -05:00 committed by GitHub
commit bb8ec56d3b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
33 changed files with 4038 additions and 6781 deletions

1
.gitattributes vendored Normal file
View file

@ -0,0 +1 @@
*.ipynb linguist-language=Python

View file

@ -30,4 +30,4 @@ CUDA version:
PyTorch version:
HuggingFace transformers version:
HuggingFace accelerate version:
Pandas version:

View file

@ -36,13 +36,13 @@ jobs:
run: pip install ".[dev]" -f https://download.pytorch.org/whl/cpu/torch_stable.html
- name: Run Eval Script for Chronos-2
run: python scripts/evaluation/evaluate.py chronos-2 ci/evaluate/backtest_config.yaml $CHRONOS_2_RESULTS_CSV --model-id=s3://autogluon/chronos-2 --device=cpu --torch-dtype=float32
run: python scripts/evaluation/evaluate.py chronos-2 ci/evaluate/backtest_config.yaml $CHRONOS_2_RESULTS_CSV --model-id=amazon/chronos-2 --device=cpu --torch-dtype=float32
- name: Print Chronos-2 CSV
run: cat $CHRONOS_2_RESULTS_CSV
- name: Run Eval Script for Chronos-Bolt
run: python scripts/evaluation/evaluate.py chronos-bolt ci/evaluate/backtest_config.yaml $CHRONOS_BOLT_RESULTS_CSV --model-id=amazon/chronos-bolt-small --device=cpu --torch-dtype=float32
- name: Print Chronos-Bolt CSV
run: cat $CHRONOS_BOLT_RESULTS_CSV

7
.gitignore vendored
View file

@ -160,4 +160,9 @@ cython_debug/
#.idea/
# macOS stuff
.DS_store
.DS_store
chronos-2-finetuned
# Kiro IDE
.kiro

View file

@ -1,7 +1,3 @@
<div align="center">
<img src="https://raw.githubusercontent.com/amazon-science/chronos-forecasting/main/figures/chronos-logo.png" width="60%">
</div>
<div align="center">
# Chronos: Pretrained Models for Time Series Forecasting
@ -11,7 +7,7 @@
[![huggingface](https://img.shields.io/badge/%F0%9F%A4%97%20HF-Datasets-FFD21E)](https://huggingface.co/datasets/autogluon/chronos_datasets)
[![huggingface](https://img.shields.io/badge/%F0%9F%A4%97%20HF-Models-FFD21E)](https://huggingface.co/collections/amazon/chronos-models-65f1791d630a8d57cb718444)
[![fev](https://img.shields.io/static/v1?label=fev&message=Benchmark&color=B31B1B&logo=github)](https://github.com/autogluon/fev)
[![aws](https://img.shields.io/static/v1?label=SageMaker&message=Deploy&color=FF9900&logo=amazon-web-services)](notebooks/deploy-chronos-bolt-to-amazon-sagemaker.ipynb)
[![aws](https://img.shields.io/static/v1?label=SageMaker&message=Deploy&color=FF9900&logo=amazon-web-services)](notebooks/deploy-chronos-to-amazon-sagemaker.ipynb)
[![faq](https://img.shields.io/badge/FAQ-Questions%3F-blue)](https://github.com/amazon-science/chronos-forecasting/issues?q=is%3Aissue+label%3AFAQ)
[![License: MIT](https://img.shields.io/badge/License-Apache--2.0-green.svg)](https://opensource.org/licenses/Apache-2.0)
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/amazon-science/chronos-forecasting)
@ -20,8 +16,8 @@
## 🚀 News
- **20 Oct 2025**: 🚀 [Chronos-2](https://arxiv.org/abs/2510.15821) released. It offers _zero-shot_ support for univariate, multivariate, and covariate-informed forecasting tasks. Chronos-2 achieves the best performance on fev-bench, GIFT-Eval and Chronos Benchmark II amongst pretrained models. Check out [this notebook](notebooks/chronos-2-quickstart.ipynb) to get started with Chronos-2.
- **14 Feb 2025**: 🚀 Chronos-Bolt is now available on Amazon SageMaker JumpStart! Check out the [tutorial notebook](notebooks/deploy-chronos-bolt-to-amazon-sagemaker.ipynb) to learn how to deploy Chronos endpoints for production use in 3 lines of code.
- **30 Dec 2025**: ☁️ Deploy Chronos-2 to AWS with Amazon SageMaker: new guide covers real-time inference (GPU/CPU), serverless endpoints with automatic scaling, and batch transform for large-scale forecasting. See the [deployment tutorial](notebooks/deploy-chronos-to-amazon-sagemaker.ipynb).
- **20 Oct 2025**: 🚀 [Chronos-2](https://huggingface.co/amazon/chronos-2) released. It offers _zero-shot_ support for univariate, multivariate, and covariate-informed forecasting tasks. Chronos-2 achieves the best performance on fev-bench, GIFT-Eval and Chronos Benchmark II amongst pretrained models. Check out [this notebook](notebooks/chronos-2-quickstart.ipynb) to get started with Chronos-2.
- **12 Dec 2024**: 📊 We released [`fev`](https://github.com/autogluon/fev), a lightweight package for benchmarking time series forecasting models based on the [Hugging Face `datasets`](https://huggingface.co/docs/datasets/en/index) library.
- **26 Nov 2024**: ⚡️ Chronos-Bolt models released [on HuggingFace](https://huggingface.co/collections/amazon/chronos-models-65f1791d630a8d57cb718444). Chronos-Bolt models are more accurate (5% lower error), up to 250x faster and 20x more memory efficient than the original Chronos models of the same size!
- **13 Mar 2024**: 🚀 Chronos [paper](https://arxiv.org/abs/2403.07815) and inference code released.
@ -40,7 +36,9 @@ This package provides an interface to the Chronos family of **pretrained time se
| Model ID | Parameters |
| ---------------------------------------------------------------------- | ---------- |
| [`s3://autogluon/chronos-2`](https://arxiv.org/abs/2510.15821) | 120M |
| [`amazon/chronos-2`](https://huggingface.co/amazon/chronos-2) | 120M |
| [`autogluon/chronos-2-synth`](https://huggingface.co/autogluon/chronos-2-synth) | 120M |
| [`autogluon/chronos-2-small`](https://huggingface.co/autogluon/chronos-2-small) | 28M |
| [`amazon/chronos-bolt-tiny`](https://huggingface.co/amazon/chronos-bolt-tiny) | 9M |
| [`amazon/chronos-bolt-mini`](https://huggingface.co/amazon/chronos-bolt-mini) | 21M |
| [`amazon/chronos-bolt-small`](https://huggingface.co/amazon/chronos-bolt-small) | 48M |
@ -49,7 +47,7 @@ This package provides an interface to the Chronos family of **pretrained time se
| [`amazon/chronos-t5-mini`](https://huggingface.co/amazon/chronos-t5-mini) | 20M |
| [`amazon/chronos-t5-small`](https://huggingface.co/amazon/chronos-t5-small) | 46M |
| [`amazon/chronos-t5-base`](https://huggingface.co/amazon/chronos-t5-base) | 200M |
| [`amazon/chronos-t5-large`](https://huggingface.co/amazon/chronos-t5-large) | 710M |
| [`amazon/chronos-t5-large`](https://huggingface.co/amazon/chronos-t5-large) | 710M |
</div>
@ -61,6 +59,10 @@ To perform inference with Chronos, the easiest way is to install this package th
pip install chronos-forecasting
```
> [!TIP]
> For reliable production use, we recommend using Chronos-2 models through [Amazon SageMaker JumpStart](https://aws.amazon.com/sagemaker/ai/jumpstart/). Check out [this tutorial](notebooks/deploy-chronos-to-amazon-sagemaker.ipynb) to learn how to deploy Chronos-2 inference endpoints to AWS with just a few lines of code.
### Forecasting
A minimal example showing how to perform forecasting using Chronos-2:
@ -69,7 +71,7 @@ A minimal example showing how to perform forecasting using Chronos-2:
import pandas as pd # requires: pip install 'pandas[pyarrow]'
from chronos import Chronos2Pipeline
pipeline = Chronos2Pipeline.from_pretrained("s3://autogluon/chronos-2", device_map="cuda")
pipeline = Chronos2Pipeline.from_pretrained("amazon/chronos-2", device_map="cuda")
# Load historical target values and past values of covariates
context_df = pd.read_parquet("https://autogluon.s3.amazonaws.com/datasets/timeseries/electricity_price/train.parquet")
@ -116,8 +118,15 @@ plt.legend()
## Example Notebooks
- [Chronos-2 Quick Start](notebooks/chronos-2-quickstart.ipynb)
- [Deploy Chronos-Bolt on Amazon SageMaker](notebooks/deploy-chronos-bolt-to-amazon-sagemaker.ipynb)
- Deploy Chronos-2 on Amazon SageMaker (coming soon!)
&nbsp;
<a href="https://studiolab.sagemaker.aws/import/github/amazon-science/chronos-forecasting/blob/main/notebooks/chronos-2-quickstart.ipynb">
<img src="https://studiolab.sagemaker.aws/studiolab.svg" alt="Open In SageMaker Studio Lab" height="18" align="absmiddle">
</a>
&nbsp;
<a href="https://colab.research.google.com/github/amazon-science/chronos-forecasting/blob/main/notebooks/chronos-2-quickstart.ipynb">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="18" align="absmiddle">
</a>
- [Deploy Chronos-2 on Amazon SageMaker](notebooks/deploy-chronos-to-amazon-sagemaker.ipynb)
## 📝 Citation

Binary file not shown.

Before

Width:  |  Height:  |  Size: 118 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 227 KiB

File diff suppressed because it is too large Load diff

Before

Width:  |  Height:  |  Size: 149 KiB

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -14,13 +14,12 @@ readme = "README.md"
license = { file = "LICENSE" }
requires-python = ">=3.10"
dependencies = [
"torch>=2.0,<3",
"transformers>=4.49,<5",
"torch>=2.2,<3",
"transformers>=4.41,<5",
"accelerate>=0.34,<2",
"numpy>=1.21,<3",
"einops>=0.7.0,<1",
"scikit-learn>=1.6.0,<2",
"boto3",
]
classifiers = [
"Programming Language :: Python :: 3",
@ -40,7 +39,19 @@ packages = ["src/chronos"]
path = "src/chronos/__about__.py"
[project.optional-dependencies]
test = ["pytest~=8.0", "numpy>=1.21,<3", "fev>=0.6.1", "pandas>=2.0,<2.4"]
extras = [
"boto3>=1.10,<2",
"peft>=0.13.0,<0.18",
"fev>=0.6.1",
"pandas[pyarrow]>=2.0,<2.4",
]
test = [
"pytest~=8.0",
"boto3>=1.10,<2",
"peft>=0.13.0,<1",
"fev>=0.6.1",
"pandas[pyarrow]>=2.0,<2.4",
]
typecheck = ["mypy~=1.9"]
dev = [
"gluonts[pro]~=0.16",

View file

@ -295,7 +295,7 @@ def chronos_2(
device: str = "cuda",
torch_dtype: str = "float32",
batch_size: int = 32,
predict_batches_jointly: bool = False,
cross_learning: bool = False,
):
"""Evaluate Chronos-2 models.
@ -316,7 +316,7 @@ def chronos_2(
batch_size : int, optional, default = 32
Batch size for inference. For Chronos-Bolt models, significantly larger
batch sizes can be used
predict_batches_jointly: bool, optional, default = False
cross_learning: bool, optional, default = False
If True, cross-learning is enables and model makes joint predictions for all
items in the batch
"""
@ -335,7 +335,7 @@ def chronos_2(
metrics_path=metrics_path,
model_id=model_id,
batch_size=batch_size,
predict_batches_jointly=predict_batches_jointly,
cross_learning=cross_learning,
)

View file

@ -663,7 +663,6 @@ def main(
lr_scheduler_type=lr_scheduler_type,
warmup_ratio=warmup_ratio,
optim=optim,
logging_dir=str(output_dir / "logs"),
logging_strategy="steps",
logging_steps=log_steps,
save_strategy="steps",

View file

@ -1 +1 @@
__version__ = "2.0.0"
__version__ = "2.2.2"

View file

@ -17,8 +17,10 @@ import torch
if TYPE_CHECKING:
import datasets
import fev
import pandas as pd
from transformers import PreTrainedModel
from .utils import left_pad_and_stack_1D
@ -53,6 +55,14 @@ class BaseChronosPipeline(metaclass=PipelineRegistry):
# for easy access to the inner HF-style model
self.inner_model = inner_model
@property
def model_context_length(self) -> int:
raise NotImplementedError()
@property
def model_prediction_length(self) -> int:
raise NotImplementedError()
def _prepare_and_validate_context(self, context: Union[torch.Tensor, List[torch.Tensor]]):
if isinstance(context, list):
context = left_pad_and_stack_1D(context)
@ -122,6 +132,123 @@ class BaseChronosPipeline(metaclass=PipelineRegistry):
"""
raise NotImplementedError()
def predict_df(
self,
df: "pd.DataFrame",
*,
id_column: str = "item_id",
timestamp_column: str = "timestamp",
target: str = "target",
prediction_length: int | None = None,
quantile_levels: list[float] = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
validate_inputs: bool = True,
freq: str | None = None,
**predict_kwargs,
) -> "pd.DataFrame":
"""
Perform forecasting on time series data in a long-format pandas DataFrame.
Parameters
----------
df
Time series data in long format with an id column, a timestamp, and one target column.
Any other columns, if present, will be ignored
id_column
The name of the column which contains the unique time series identifiers, by default "item_id"
timestamp_column
The name of the column which contains timestamps, by default "timestamp"
All time series in the dataframe must have regular timestamps with the same frequency (no gaps)
target
The name of the column which contains the target variables to be forecasted, by default "target"
prediction_length
Number of steps to predict for each time series
quantile_levels
Quantile levels to compute
validate_inputs
[ADVANCED] When True (default), validates dataframes before prediction. Setting to False removes the
validation overhead, but may silently lead to wrong predictions if data is misformatted. When False, you
must ensure: (1) all dataframes are sorted by (id_column, timestamp_column); (2) future_df (if provided)
has the same item IDs as df with exactly prediction_length rows of future timestamps per item; (3) all
timestamps are regularly spaced (e.g., with hourly frequency).
freq
Frequency string for timestamp generation (e.g., "h", "D", "W"). Can only be used when
validate_inputs=False. When provided, skips frequency inference from the data.
**predict_kwargs
Additional arguments passed to predict_quantiles
Returns
-------
The forecasts dataframe generated by the model with the following columns
- `id_column`: The time series ID
- `timestamp_column`: Future timestamps
- "target_name": The name of the target column
- "predictions": The point predictions generated by the model
- One column for predictions at each quantile level in `quantile_levels`
"""
try:
import pandas as pd
from .df_utils import convert_df_input_to_list_of_dicts_input
except ImportError:
raise ImportError("pandas is required for predict_df. Please install it with `pip install pandas`.")
if not isinstance(target, str):
raise ValueError(
f"Expected `target` to be str, but found {type(target)}. {self.__class__.__name__} only supports univariate forecasting."
)
if prediction_length is None:
prediction_length = self.model_prediction_length
inputs, original_order, prediction_timestamps = convert_df_input_to_list_of_dicts_input(
df=df,
future_df=None,
id_column=id_column,
timestamp_column=timestamp_column,
target_columns=[target],
prediction_length=prediction_length,
freq=freq,
validate_inputs=validate_inputs,
)
# NOTE: any covariates, if present, are ignored here
context = [torch.tensor(item["target"]).squeeze(0) for item in inputs] # squeeze the extra variate dim
# Generate forecasts
quantiles, mean = self.predict_quantiles(
inputs=context,
prediction_length=prediction_length,
quantile_levels=quantile_levels,
limit_prediction_length=False,
**predict_kwargs,
)
quantiles_np = quantiles.numpy() # [n_series, horizon, num_quantiles]
mean_np = mean.numpy() # [n_series, horizon]
series_ids = list(prediction_timestamps.keys())
future_ts = list(prediction_timestamps.values())
data = {
id_column: np.repeat(series_ids, prediction_length),
timestamp_column: np.concatenate(future_ts),
"target_name": target,
"predictions": mean_np.ravel(),
}
quantiles_flat = quantiles_np.reshape(-1, len(quantile_levels))
for q_idx, q_level in enumerate(quantile_levels):
data[str(q_level)] = quantiles_flat[:, q_idx]
predictions_df = pd.DataFrame(data)
# If validate_inputs=False, the df is used as-is without sorting by item_id, no reordering required
if validate_inputs:
predictions_df.set_index(id_column, inplace=True)
predictions_df = predictions_df.loc[original_order]
predictions_df.reset_index(inplace=True)
return predictions_df
def predict_fev(
self, task: "fev.Task", batch_size: int = 32, **kwargs
) -> tuple[list["datasets.DatasetDict"], float]:

View file

@ -6,6 +6,7 @@
import logging
import os
import re
import warnings
from pathlib import Path
import boto3
@ -110,6 +111,12 @@ def cache_model_from_s3(
# Use CloudFront CDN for faster, cached downloads if available
if cloudfront_url:
warnings.warn(
f"Loading {s3_uri} from CloudFront is deprecated and will be removed in a future version. "
f'Please specify a HuggingFace model_id instead. For example: Chronos2Pipeline.from_pretrained("amazon/chronos-2")',
category=FutureWarning,
stacklevel=3,
)
try:
download_model_files_from_cloudfront(
cloudfront_url=cloudfront_url,

View file

@ -377,6 +377,14 @@ class ChronosPipeline(BaseChronosPipeline):
self.tokenizer = tokenizer
self.model = model
@property
def model_context_length(self) -> int:
return self.model.config.context_length
@property
def model_prediction_length(self) -> int:
return self.model.config.prediction_length
def _prepare_and_validate_context(self, context: Union[torch.Tensor, List[torch.Tensor]]):
if isinstance(context, list):
context = left_pad_and_stack_1D(context)

View file

@ -4,7 +4,7 @@
# Authors: Abdul Fatir Ansari <ansarnd@amazon.com>
from dataclasses import dataclass
from typing import List
from typing import List, Literal
from transformers.configuration_utils import PretrainedConfig
@ -39,6 +39,8 @@ class Chronos2CoreConfig(PretrainedConfig):
Token ID for padding/missing value token, by default 0
rope_theta
The base theta for rotary position embedding (RoPE), by default 10000.0
attn_implementation
The attention implementation to use. Options: "eager" or "sdpa", by default None (uses "sdpa")
"""
model_type = "t5"
@ -63,6 +65,7 @@ class Chronos2CoreConfig(PretrainedConfig):
vocab_size: int = 2,
pad_token_id: int = 0,
rope_theta: float = 10000.0,
attn_implementation: Literal["eager", "sdpa"] | None = None,
**kwargs,
):
self.vocab_size = vocab_size
@ -83,11 +86,17 @@ class Chronos2CoreConfig(PretrainedConfig):
assert not self.is_gated_act, "gated activation is not supported"
# Attention implementation - default to "sdpa" if not specified
attn_implementation = attn_implementation or "sdpa"
assert attn_implementation in ["eager", "sdpa"], f"attn_implementation {attn_implementation} not supported"
# unused
kwargs.pop("is_encoder_decoder", None)
kwargs.pop("eos_token_id", None)
super().__init__(pad_token_id=pad_token_id, is_encoder_decoder=False, **kwargs)
super().__init__(
pad_token_id=pad_token_id, is_encoder_decoder=False, attn_implementation=attn_implementation, **kwargs
)
@dataclass

View file

@ -5,7 +5,7 @@
import math
from enum import Enum
from typing import TYPE_CHECKING, Iterator, Mapping, Sequence, TypeAlias, cast
from typing import TYPE_CHECKING, Any, Iterable, Iterator, Mapping, Sequence, TypeAlias, TypedDict, cast
import numpy as np
import torch
@ -15,12 +15,21 @@ from torch.utils.data import IterableDataset
if TYPE_CHECKING:
import datasets
import fev
import pandas as pd
TensorOrArray: TypeAlias = torch.Tensor | np.ndarray
class PreparedInput(TypedDict):
"""A preprocessed time series input ready for model training/inference."""
context: torch.Tensor # (n_variates, history_length), float32
future_covariates: torch.Tensor # (n_variates, prediction_length), float32
n_targets: int
n_covariates: int
n_future_covariates: int
def left_pad_and_cat_2D(tensors: list[torch.Tensor]) -> torch.Tensor:
"""
Left pads tensors in the list to the length of the longest tensor along the second axis, then concats
@ -38,14 +47,14 @@ def left_pad_and_cat_2D(tensors: list[torch.Tensor]) -> torch.Tensor:
return torch.cat(padded, dim=0)
def validate_and_prepare_single_dict_task(
task: Mapping[str, TensorOrArray | Mapping[str, TensorOrArray]], idx: int, prediction_length: int
) -> tuple[torch.Tensor, torch.Tensor, int, int, int]:
"""Validates and prepares a single dictionary task for Chronos2Model.
def validate_and_prepare_single_dict_input(
raw_input: Mapping[str, TensorOrArray | Mapping[str, TensorOrArray]], idx: int, prediction_length: int
) -> PreparedInput:
"""Validates and prepares a single dictionary input for Chronos2Model.
Parameters
----------
task
raw_input
A dictionary representing a time series that contains:
- `target` (required): a 1-d or 2-d `torch.Tensor` or `np.ndarray` of shape (history_length,) or (n_variates, history_length).
Forecasts will be generated for items in `target`.
@ -56,27 +65,27 @@ def validate_and_prepare_single_dict_task(
covariates and values must be 1-d `torch.Tensor` or `np.ndarray` with length equal to the `prediction_length`. All keys in
`future_covariates` must be a subset of the keys in `past_covariates`.
idx
Index of this task in the list of tasks, used for error messages
Index of this input in the list of inputs, used for error messages
prediction_length
Number of future time steps to predict, used to validate future covariates
Returns
------
A tuple containing:
- task_context_tensor: Concatenated tensor of target and past covariates of shape (group_size, history_length),
the first `task_n_targets` items along the first axis contain the target variables and the remaining items contain past-only covariates
A PreparedInput containing:
- context: Concatenated tensor of target and past covariates of shape (group_size, history_length),
the first `n_targets` items along the first axis contain the target variables and the remaining items contain past-only covariates
and past values of known future covariates.
- task_future_covariates_tensor: Tensor of future covariates of shape (group_size, prediction_length). The last `task_n_future_covariates`
- future_covariates: Tensor of future covariates of shape (group_size, prediction_length). The last `n_future_covariates`
items along the first axis contain future covariates. All the remaining elements corresponding to target and past-only covariates are NaNs.
- task_n_targets: Number of target variables
- task_n_covariates: Total number of covariates (sum of past-only and known future covariates)
- task_n_future_covariates: Number of known future covariates
- n_targets: Number of target variables
- n_covariates: Total number of covariates (sum of past-only and known future covariates)
- n_future_covariates: Number of known future covariates
"""
allowed_keys = {"target", "past_covariates", "future_covariates"}
# validate keys
keys = set(task.keys())
keys = set(raw_input.keys())
if not keys.issubset(allowed_keys):
raise ValueError(
f"Found invalid keys in element at index {idx}. Allowed keys are {allowed_keys}, but found {keys}"
@ -85,38 +94,58 @@ def validate_and_prepare_single_dict_task(
raise ValueError(f"Element at index {idx} does not contain the required key 'target'")
# validate target
task_target = task["target"]
if isinstance(task_target, np.ndarray):
task_target = torch.from_numpy(task_target)
assert isinstance(task_target, torch.Tensor)
if task_target.ndim > 2:
target = raw_input["target"]
if isinstance(target, np.ndarray):
target = torch.from_numpy(target)
assert isinstance(target, torch.Tensor)
if target.ndim > 2:
raise ValueError(
"When the input is a list of dicts, the `target` should either be 1-d with shape (history_length,) "
f" or 2-d with shape (n_variates, history_length). Found element at index {idx} with shape {tuple(task_target.shape)}."
f" or 2-d with shape (n_variates, history_length). Found element at index {idx} with shape {tuple(target.shape)}."
)
history_length = task_target.shape[-1]
task_target = task_target.view(-1, history_length)
history_length = target.shape[-1]
target = target.view(-1, history_length)
# validate past_covariates
cat_encoders: dict = {}
task_past_covariates = task.get("past_covariates", {})
if not isinstance(task_past_covariates, dict):
past_covariates = raw_input.get("past_covariates", {})
if not isinstance(past_covariates, dict):
raise ValueError(
f"Found invalid type for `past_covariates` in element at index {idx}. "
f'Expected dict with {{"feat_1": tensor_1, "feat_2": tensor_2, ...}}, but found {type(task_past_covariates)}'
f'Expected dict with {{"feat_1": tensor_1, "feat_2": tensor_2, ...}}, but found {type(past_covariates)}'
)
task_covariates_keys = sorted(task_past_covariates.keys())
task_past_covariates_list: list[torch.Tensor] = []
for key in task_covariates_keys:
tensor = task_past_covariates[key]
# gather keys and ensure known-future keys come last to match downstream assumptions
covariates_keys = sorted(past_covariates.keys())
future_covariates = raw_input.get("future_covariates", {})
if not isinstance(future_covariates, dict):
raise ValueError(
f"Found invalid type for `future_covariates` in element at index {idx}. "
f'Expected dict with {{"feat_1": tensor_1, "feat_2": tensor_2, ...}}, but found {type(future_covariates)}'
)
future_covariates_keys = sorted(future_covariates.keys())
if not set(future_covariates_keys).issubset(covariates_keys):
raise ValueError(
f"Expected keys in `future_covariates` to be a subset of `past_covariates` {covariates_keys}, "
f"but found {future_covariates_keys} in element at index {idx}"
)
# create ordered keys: past-only first, then known-future (so known-future are the last rows)
past_only_keys = [k for k in covariates_keys if k not in future_covariates_keys]
ordered_covariate_keys = past_only_keys + future_covariates_keys
past_covariates_list: list[torch.Tensor] = []
for key in ordered_covariate_keys:
tensor = past_covariates[key]
if isinstance(tensor, np.ndarray):
# apply encoding to categorical variates
if not np.issubdtype(tensor.dtype, np.number):
# target encoding, if the target is 1-d
if task_target.shape[0] == 1:
if target.shape[0] == 1:
cat_encoder = TargetEncoder(target_type="continuous", smooth=1.0)
X = tensor.astype(str).reshape(-1, 1)
y = task_target.view(-1).numpy()
y = target.view(-1).numpy()
mask = np.isfinite(y)
X = X[mask]
y = y[mask]
@ -134,29 +163,18 @@ def validate_and_prepare_single_dict_task(
f"Individual `past_covariates` must be 1-d with length equal to the length of `target` (= {history_length}), "
f"found: {key} with shape {tuple(tensor.shape)} in element at index {idx}"
)
task_past_covariates_list.append(tensor)
task_past_covariates_tensor = (
torch.stack(task_past_covariates_list, dim=0)
if task_past_covariates_list
else torch.zeros((0, history_length), device=task_target.device)
past_covariates_list.append(tensor)
past_covariates_tensor = (
torch.stack(past_covariates_list, dim=0)
if past_covariates_list
else torch.zeros((0, history_length), device=target.device)
)
# validate future_covariates
task_future_covariates = task.get("future_covariates", {})
if not isinstance(task_future_covariates, dict):
raise ValueError(
f"Found invalid type for `future_covariates` in element at index {idx}. "
f'Expected dict with {{"feat_1": tensor_1, "feat_2": tensor_2, ...}}, but found {type(task_future_covariates)}'
)
task_future_covariates_keys = sorted(task_future_covariates.keys())
if not set(task_future_covariates_keys).issubset(task_covariates_keys):
raise ValueError(
f"Expected keys in `future_covariates` to be a subset of `past_covariates` {task_covariates_keys}, "
f"but found {task_future_covariates_keys} in element at index {idx}"
)
task_future_covariates_list: list[torch.Tensor] = []
for key in task_covariates_keys:
# validate future_covariates (build rows in the same ordered_covariate_keys order)
future_covariates_list: list[torch.Tensor] = []
for key in ordered_covariate_keys:
# future values of past-only covariates are filled with NaNs
tensor = task_future_covariates.get(key, torch.full((prediction_length,), fill_value=torch.nan))
tensor = future_covariates.get(key, torch.full((prediction_length,), fill_value=torch.nan))
if isinstance(tensor, np.ndarray):
# apply encoding to categorical variates
if not np.issubdtype(tensor.dtype, np.number):
@ -169,34 +187,118 @@ def validate_and_prepare_single_dict_task(
f"Individual `future_covariates` must be 1-d with length equal to the {prediction_length=}, "
f"found: {key} with shape {tuple(tensor.shape)} in element at index {idx}"
)
task_future_covariates_list.append(tensor)
task_future_covariates_tensor = (
torch.stack(task_future_covariates_list, dim=0)
if task_future_covariates_list
else torch.zeros((0, prediction_length), device=task_target.device)
future_covariates_list.append(tensor)
future_covariates_tensor = (
torch.stack(future_covariates_list, dim=0)
if future_covariates_list
else torch.zeros((0, prediction_length), device=target.device)
)
# future values of target series are filled with NaNs
task_future_covariates_target_padding = torch.full(
(task_target.shape[0], prediction_length), fill_value=torch.nan, device=task_target.device
future_covariates_target_padding = torch.full(
(target.shape[0], prediction_length), fill_value=torch.nan, device=target.device
)
task_context_tensor = torch.cat([task_target, task_past_covariates_tensor], dim=0).to(dtype=torch.float32)
task_future_covariates_tensor = torch.cat(
[task_future_covariates_target_padding, task_future_covariates_tensor], dim=0
context_tensor = torch.cat([target, past_covariates_tensor], dim=0).to(dtype=torch.float32)
future_covariates_tensor = torch.cat(
[future_covariates_target_padding, future_covariates_tensor], dim=0
).to(dtype=torch.float32)
task_n_targets = task_target.shape[0]
task_n_covariates = task_past_covariates_tensor.shape[0]
task_n_future_covariates = len(task_future_covariates_list)
n_targets = target.shape[0]
n_covariates = past_covariates_tensor.shape[0]
# number of known-future covariates
n_future_covariates = len(future_covariates_keys)
return (
task_context_tensor,
task_future_covariates_tensor,
task_n_targets,
task_n_covariates,
task_n_future_covariates,
return PreparedInput(
context=context_tensor,
future_covariates=future_covariates_tensor,
n_targets=n_targets,
n_covariates=n_covariates,
n_future_covariates=n_future_covariates,
)
def prepare_inputs(
raw_inputs: Iterable[Mapping[str, Any]],
prediction_length: int,
min_past: int = 1,
mode: "DatasetMode | str" = "train",
) -> list[PreparedInput]:
"""Prepare multiple time series inputs for training/inference.
This function handles mode-specific preprocessing (e.g., filtering short series)
and calls validate_and_prepare_single_dict_input for each input.
"""
inputs: list[PreparedInput] = []
for idx, raw_input in enumerate(raw_inputs):
# For non-TEST modes, fix future_covariates (replace None/empty with NaN arrays)
if mode != DatasetMode.TEST:
raw_future_covariates = raw_input.get("future_covariates", {})
if raw_future_covariates:
raw_future_covariates = cast(dict[str, TensorOrArray | None], raw_future_covariates)
fixed_future_covariates = {}
for key, value in raw_future_covariates.items():
fixed_future_covariates[key] = (
np.full(prediction_length, np.nan) if value is None or len(value) == 0 else value
)
raw_input = {**raw_input, "future_covariates": fixed_future_covariates}
raw_input = cast(dict[str, TensorOrArray | Mapping[str, TensorOrArray]], raw_input)
prepared = validate_and_prepare_single_dict_input(raw_input, idx, prediction_length)
# Filter by minimum length (except in TEST mode)
if mode != DatasetMode.TEST and prepared["context"].shape[-1] < min_past + prediction_length:
continue
inputs.append(prepared)
if len(inputs) == 0:
raise ValueError(
"The dataset is empty after filtering based on the length of the time series (length >= min_past + prediction_length). "
"Please provide longer time series or reduce `min_past` or `prediction_length`. "
)
return inputs
def validate_prepared_schema(prepared_input: Any) -> None:
"""Validate that an input matches the PreparedInput schema."""
if not isinstance(prepared_input, Mapping):
raise TypeError(
f"Expected input to be a dict-like, got {type(prepared_input).__name__}. "
"Set convert_inputs=True when calling fit() to preprocess raw inputs."
)
required_keys = {"context", "future_covariates", "n_targets", "n_covariates", "n_future_covariates"}
missing = required_keys - set(prepared_input.keys())
if missing:
raise TypeError(
f"Input is missing required keys: {missing}. Set convert_inputs=True when calling fit() to preprocess raw inputs."
)
context = prepared_input["context"]
if not isinstance(context, torch.Tensor) or context.ndim != 2:
raise TypeError(
f"Expected 'context' to be 2-d torch.Tensor, got {type(context).__name__} "
f"with shape {getattr(context, 'shape', 'N/A')}. "
"Set convert_inputs=True when calling fit() to preprocess raw inputs."
)
future_covariates = prepared_input["future_covariates"]
if not isinstance(future_covariates, torch.Tensor) or future_covariates.ndim != 2:
raise TypeError(
f"Expected 'future_covariates' to be 2-d torch.Tensor, got {type(future_covariates).__name__} "
f"with shape {getattr(future_covariates, 'shape', 'N/A')}. "
"Set convert_inputs=True when calling fit() to preprocess raw inputs."
)
if context.shape[0] != future_covariates.shape[0]:
raise ValueError(
f"Expected 'context' and 'future_covariates' to have the same first dimension, "
f"got {context.shape[0]} and {future_covariates.shape[0]}. "
"Set convert_inputs=True when calling fit() to preprocess raw inputs."
)
def convert_list_of_tensors_input_to_list_of_dicts_input(
list_of_tensors: Sequence[TensorOrArray],
) -> list[dict[str, torch.Tensor]]:
@ -265,308 +367,6 @@ def convert_tensor_input_to_list_of_dicts_input(tensor: TensorOrArray) -> list[d
return output
def _validate_df_types_and_cast(
df: "pd.DataFrame",
future_df: "pd.DataFrame | None",
target_columns: list[str],
id_column: str = "item_id",
timestamp_column: str = "timestamp",
) -> tuple["pd.DataFrame", "pd.DataFrame | None"]:
import pandas as pd
astype_dict = {}
future_astype_dict = {}
for col in df.columns.drop([id_column, timestamp_column]):
col_dtype = df[col].dtype
if col in target_columns and not pd.api.types.is_numeric_dtype(df[col]):
raise ValueError(f"All target columns must be numeric but got {col=} with dtype={col_dtype}")
if (
pd.api.types.is_object_dtype(df[col])
or pd.api.types.is_string_dtype(df[col])
or isinstance(col_dtype, pd.CategoricalDtype)
):
astype_dict[col] = "category"
elif pd.api.types.is_numeric_dtype(df[col]) or pd.api.types.is_bool_dtype(df[col]):
astype_dict[col] = "float32"
else:
raise ValueError(
f"All columns must contain numeric, object, category, string, or bool dtype but got {col=} with dtype={col_dtype}"
)
if future_df is not None and col in future_df.columns:
if future_df[col].dtype != col_dtype:
raise ValueError(
f"Column {col} in future_df has dtype {future_df[col].dtype} but column in df has dtype {col_dtype}"
)
future_astype_dict[col] = astype_dict[col]
df = df.astype(astype_dict, copy=True)
if future_df is not None:
future_df = future_df.astype(future_astype_dict, copy=True)
return df, future_df
def validate_df_inputs(
df: "pd.DataFrame",
future_df: "pd.DataFrame | None",
target_columns: list[str],
prediction_length: int,
id_column: str = "item_id",
timestamp_column: str = "timestamp",
) -> tuple["pd.DataFrame", "pd.DataFrame | None", "pd.Timedelta", list[int], list[int] | None, np.ndarray]:
"""
Validates and prepares dataframe inputs passed to `Chronos2Pipeline.predict_df`.
Parameters
----------
df
Input dataframe containing time series data with columns:
- id_column: Identifier for each time series
- timestamp_column: Timestamps for each observation
- target_columns: One or more target variables to forecast
- Additional columns are treated as covariates
future_df
Optional dataframe containing future covariate values with columns:
- id_column: Identifier for each time series
- timestamp_column: Future timestamps
- Subset of covariate columns from df
target_columns
Names of target columns to forecast
prediction_length
Number of future time steps to predict
id_column
Name of column containing time series identifiers
timestamp_column
Name of column containing timestamps
Returns
-------
A tuple containing:
- Validated and sorted input dataframe
- Validated and sorted future dataframe (if provided)
- Inferred frequency of the time series
- List of series lengths from input dataframe
- List of series lengths from future dataframe (if provided)
- Original order of time series IDs
Raises
------
ValueError
If validation fails for:
- Missing required columns
- Invalid data types
- Inconsistent frequencies
- Insufficient data points
- Mismatched series between df and future_df
- Invalid future_df lengths
"""
import pandas as pd
required_cols = [id_column, timestamp_column] + target_columns
missing_cols = [col for col in required_cols if col not in df.columns]
if missing_cols:
raise ValueError(f"df does not contain all expected columns. Missing columns: {missing_cols}")
if future_df is not None:
future_required_cols = [id_column, timestamp_column]
missing_future_cols = [col for col in future_required_cols if col not in future_df.columns]
targets_in_future = [col for col in future_df.columns if col in target_columns]
extra_future_cols = [col for col in future_df.columns if col not in df.columns]
if missing_future_cols:
raise ValueError(
f"future_df does not contain all expected columns. Missing columns: {missing_future_cols}"
)
if targets_in_future:
raise ValueError(
f"future_df cannot contain target columns. Target columns found in future_df: {targets_in_future}"
)
if extra_future_cols:
raise ValueError(f"future_df cannot contain columns not present in df. Extra columns: {extra_future_cols}")
df, future_df = _validate_df_types_and_cast(
df, future_df, id_column=id_column, timestamp_column=timestamp_column, target_columns=target_columns
)
# Get the original order of time series IDs
original_order = df[id_column].unique()
# Sort and prepare df
df[timestamp_column] = pd.to_datetime(df[timestamp_column])
df = df.sort_values([id_column, timestamp_column])
# Get series lengths
series_lengths = df[id_column].value_counts(sort=False).to_list()
def validate_freq(timestamps: pd.Series, series_id: str):
freq = pd.infer_freq(timestamps)
if not freq:
raise ValueError(f"Could not infer frequency for series {series_id}")
return freq
# Validate each series
all_freqs = []
start_idx = 0
for length in series_lengths:
if length < 3:
series_id = df.iloc[start_idx][id_column]
raise ValueError(
f"Every time series must have at least 3 data points, found {length=} for series {series_id}"
)
series_data = df.iloc[start_idx : start_idx + length]
timestamps = series_data[timestamp_column]
series_id = series_data.iloc[0][id_column]
all_freqs.append(validate_freq(timestamps, series_id))
start_idx += length
if len(set(all_freqs)) > 1:
raise ValueError("All time series must have the same frequency")
inferred_freq = all_freqs[0]
# Sort future_df if provided and validate its series lengths
future_series_lengths = None
if future_df is not None:
future_df[timestamp_column] = pd.to_datetime(future_df[timestamp_column])
future_df = future_df.sort_values([id_column, timestamp_column])
# Validate that future_df contains all series from df
context_ids = set(df[id_column].unique())
future_ids = set(future_df[id_column].unique())
if context_ids != future_ids:
raise ValueError("future_df must contain the same time series IDs as df")
future_series_lengths = future_df[id_column].value_counts(sort=False).to_list()
# Validate future series lengths match prediction_length
future_start_idx = 0
for future_length in future_series_lengths:
future_series_data = future_df.iloc[future_start_idx : future_start_idx + future_length]
future_timestamps = future_series_data[timestamp_column]
future_series_id = future_series_data.iloc[0][id_column]
if future_length != prediction_length:
raise ValueError(
f"Future covariates all time series must have length {prediction_length}, got {future_length} for series {future_series_id}"
)
if future_length < 3 or inferred_freq != validate_freq(future_timestamps, future_series_id):
raise ValueError(
f"Future covariates must have the same frequency as context, found series {future_series_id} with a different frequency"
)
future_start_idx += future_length
assert len(series_lengths) == len(future_series_lengths)
return df, future_df, inferred_freq, series_lengths, future_series_lengths, original_order
def convert_df_input_to_list_of_dicts_input(
df: "pd.DataFrame",
future_df: "pd.DataFrame | None",
target_columns: list[str],
prediction_length: int,
id_column: str = "item_id",
timestamp_column: str = "timestamp",
) -> tuple[list[dict[str, np.ndarray | dict[str, np.ndarray]]], np.ndarray, dict[str, "pd.DatetimeIndex"]]:
"""
Convert from dataframe input format to a list of dictionaries input format.
Parameters
----------
df
Input dataframe containing time series data with columns:
- id_column: Identifier for each time series
- timestamp_column: Timestamps for each observation
- target_columns: One or more target variables to forecast
- Additional columns are treated as covariates
future_df
Optional dataframe containing future covariate values with columns:
- id_column: Identifier for each time series
- timestamp_column: Future timestamps
- Subset of covariate columns from df
target_columns
Names of target columns to forecast
prediction_length
Number of future time steps to predict
id_column
Name of column containing time series identifiers
timestamp_column
Name of column containing timestamps
Returns
-------
A tuple containing:
- List of dictionaries in the format expected by `Chronos2Pipeline.predict`
- Original order of time series IDs
- Dictionary mapping series IDs to future time index
"""
import pandas as pd
df, future_df, freq, series_lengths, future_series_lengths, original_order = validate_df_inputs(
df,
future_df=future_df,
id_column=id_column,
timestamp_column=timestamp_column,
target_columns=target_columns,
prediction_length=prediction_length,
)
# Convert to list of dicts format
inputs: list[dict[str, np.ndarray | dict[str, np.ndarray]]] = []
prediction_timestamps: dict[str, pd.DatetimeIndex] = {}
start_idx: int = 0
future_start_idx: int = 0
for i, length in enumerate(series_lengths):
series_data = df.iloc[start_idx : start_idx + length]
# Extract target(s)
target_data = series_data[target_columns].to_numpy().T # Shape: (n_targets, history_length)
task: dict[str, np.ndarray | dict[str, np.ndarray]] = {"target": target_data}
# Generate future timestamps
series_id = series_data.iloc[0][id_column]
last_timestamp = series_data[timestamp_column].iloc[-1]
future_ts = pd.date_range(start=last_timestamp, periods=prediction_length + 1, freq=freq)[1:]
prediction_timestamps[series_id] = future_ts
# Handle covariates if present
covariate_cols = [
col for col in series_data.columns if col not in [id_column, timestamp_column] + target_columns
]
if covariate_cols:
past_covariates = {col: series_data[col].to_numpy() for col in covariate_cols}
task["past_covariates"] = past_covariates
# Handle future covariates
if future_df is not None:
assert future_series_lengths is not None
future_length = future_series_lengths[i]
future_data = future_df.iloc[future_start_idx : future_start_idx + future_length]
assert future_data[timestamp_column].iloc[0] == future_ts[0], (
f"the first timestamp in future_df must be the first forecast timestamp, found mismatch "
f"({future_data[timestamp_column].iloc[0]} != {future_ts[0]}) in series {series_id}"
)
if len(future_data) > 0:
future_covariates = {
col: future_data[col].to_numpy() for col in covariate_cols if col in future_data.columns
}
if future_covariates:
task["future_covariates"] = future_covariates
future_start_idx += future_length
inputs.append(task)
start_idx += length
assert len(inputs) == len(series_lengths)
return inputs, original_order, prediction_timestamps
def _cast_fev_features(
past_data: "datasets.Dataset",
future_data: "datasets.Dataset",
@ -676,49 +476,65 @@ class Chronos2Dataset(IterableDataset):
Arguments
----------
inputs
Time series data. Must be a list of dictionaries where each dictionary may have the following keys.
- `target` (required): a 1-d or 2-d `torch.Tensor` or `np.ndarray` of shape (history_length,) or (n_variates, history_length).
Forecasts will be generated for items in `target`.
- `past_covariates` (optional): a dict of past-only covariates or past values of known future covariates. The keys of the dict
must be names of the covariates and values must be 1-d `torch.Tensor` or `np.ndarray` with length equal to the `history_length`
of `target`.
- `future_covariates` (optional): a dict of future values of known future covariates. The keys of the dict must be names of the
covariates and values must be 1-d `torch.Tensor` or `np.ndarray` with length equal to the `prediction_length`. All keys in
`future_covariates` must be a subset of the keys in `past_covariates`.
Note: when the mode is set to TRAIN, the values inside `future_covariates` are not technically used for training the model;
however, this key is used to infer which covariates are known into the future. Therefore, if your task contains known future covariates,
make sure that this key exists in `inputs`. The values of individual future covariates may be set to `None` or an empty array.
Time series data. Can be either:
1. Raw inputs (when `convert_inputs=True`, default): A sequence of dictionaries where each
dictionary may have the following keys:
- `target` (required): a 1-d or 2-d `torch.Tensor` or `np.ndarray` of shape (history_length,)
or (n_variates, history_length). Forecasts will be generated for items in `target`.
- `past_covariates` (optional): a dict of past-only covariates or past values of known future
covariates.
- `future_covariates` (optional): a dict of future values of known future covariates.
2. Pre-processed inputs (when `convert_inputs=False`): A sequence of `PreparedInput` dicts with keys:
`context`, `future_covariates`, `n_targets`, `n_covariates`, `n_future_covariates`.
Use `prepare_inputs()` to create pre-processed inputs.
context_length
The maximum context length used for training or inference
prediction_length
The prediction horizon
batch_size
The batch size for training the model. Note that the batch size here means the number of time series, including target(s) and
covariates, that are input into the model. If your data has multiple target and/or covariates, the effective number of time series
tasks in a batch will be lower than this value.
The batch size for training the model. Note that the batch size here means the number of time series,
including target(s) and covariates, that are input into the model.
output_patch_size
The output patch size of the model. This is used to compute the number of patches needed to cover `prediction_length`
The output patch size of the model. This is used to compute the number of patches needed to cover
`prediction_length`
min_past
The minimum number of time steps the context must have during training. All time series shorter than `min_past + prediction_length`
are filtered out, by default 1
The minimum number of time steps the context must have during training. All time series shorter than
`min_past + prediction_length` are filtered out, by default 1
mode
`DatasetMode` governing whether to generate training, validation or test samples, by default "train"
convert_inputs
If True (default), preprocess raw inputs. If False, inputs are expected to be already preprocessed.
"""
def __init__(
self,
inputs: Sequence[Mapping[str, TensorOrArray | Mapping[str, TensorOrArray | None]]],
inputs: TensorOrArray | Sequence[TensorOrArray] | Sequence[Mapping[str, Any]] | Sequence[PreparedInput],
context_length: int,
prediction_length: int,
batch_size: int,
output_patch_size: int,
min_past: int = 1,
mode: str | DatasetMode = DatasetMode.TRAIN,
convert_inputs: bool = True,
) -> None:
super().__init__()
assert mode in {DatasetMode.TRAIN, DatasetMode.VALIDATION, DatasetMode.TEST}, f"Invalid mode: {mode}"
self.tasks = Chronos2Dataset._prepare_tasks(inputs, prediction_length, min_past, mode)
self.inputs: Sequence[PreparedInput]
if convert_inputs:
if isinstance(inputs, (torch.Tensor, np.ndarray)):
inputs = convert_tensor_input_to_list_of_dicts_input(inputs)
elif (
isinstance(inputs, Sequence) and len(inputs) > 0 and isinstance(inputs[0], (torch.Tensor, np.ndarray))
):
inputs = convert_list_of_tensors_input_to_list_of_dicts_input(cast(Sequence[TensorOrArray], inputs))
self.inputs = prepare_inputs(cast(Iterable[Mapping[str, Any]], inputs), prediction_length, min_past, mode)
else:
validate_prepared_schema(inputs[0])
self.inputs = cast(Sequence[PreparedInput], inputs)
self.context_length = context_length
self.prediction_length = prediction_length
self.batch_size = batch_size
@ -726,53 +542,16 @@ class Chronos2Dataset(IterableDataset):
self.min_past = min_past
self.mode = mode
@staticmethod
def _prepare_tasks(
inputs: Sequence[Mapping[str, TensorOrArray | Mapping[str, TensorOrArray | None]]],
prediction_length: int,
min_past: int,
mode: str | DatasetMode,
):
tasks = []
for idx, raw_task in enumerate(inputs):
if mode != DatasetMode.TEST:
raw_future_covariates = raw_task.get("future_covariates", {})
raw_future_covariates = cast(dict[str, TensorOrArray | None], raw_future_covariates)
if raw_future_covariates:
fixed_future_covariates = {}
for key, value in raw_future_covariates.items():
fixed_future_covariates[key] = (
np.full(prediction_length, np.nan) if value is None or len(value) == 0 else value
)
raw_task = {**raw_task, "future_covariates": fixed_future_covariates}
def _construct_slice(self, input_idx: int) -> tuple[torch.Tensor, torch.Tensor | None, torch.Tensor, int]:
prepared = self.inputs[input_idx]
past_tensor = prepared["context"].clone() # shape: (n_targets + n_covariates, history_length)
future_tensor = prepared["future_covariates"].clone()
n_targets = int(prepared["n_targets"])
n_covariates = int(prepared["n_covariates"])
n_future_covariates = int(prepared["n_future_covariates"])
n_past_only_covariates = n_covariates - n_future_covariates
raw_task = cast(dict[str, TensorOrArray | Mapping[str, TensorOrArray]], raw_task)
# convert to a format compatible with model's forward
task = validate_and_prepare_single_dict_task(raw_task, idx, prediction_length)
if mode != DatasetMode.TEST and task[0].shape[-1] < min_past + prediction_length:
# filter tasks based on min_past + prediction_length
continue
tasks.append(task)
if len(tasks) == 0:
raise ValueError(
"The dataset is empty after filtering based on the length of the time series (length >= min_past + prediction_length). "
"Please provide longer time series or reduce `min_past` or `prediction_length`. "
)
return tasks
def _construct_slice(self, task_idx: int) -> tuple[torch.Tensor, torch.Tensor | None, torch.Tensor, int]:
(
task_past_tensor, # shape: (task_n_targets + task_n_covariates, history_length)
task_future_tensor,
task_n_targets,
task_n_covariates,
task_n_future_covariates,
) = self.tasks[task_idx]
task_n_past_only_covariates = task_n_covariates - task_n_future_covariates
full_length = task_past_tensor.shape[-1]
full_length = past_tensor.shape[-1]
if self.mode == DatasetMode.TRAIN:
# slice a random subsequence from the full series
@ -786,72 +565,74 @@ class Chronos2Dataset(IterableDataset):
if slice_idx >= self.context_length:
# slice series, if it is longer than context_length
task_context = task_past_tensor[:, slice_idx - self.context_length : slice_idx]
context = past_tensor[:, slice_idx - self.context_length : slice_idx]
else:
task_context = task_past_tensor[:, :slice_idx]
context = past_tensor[:, :slice_idx]
# In the TEST mode, we have no target available and the task_future_covariates can be directly used
# In the TRAIN and VALIDATION modes, the target and task_future_covariates need to be constructed from
# the task_context_tensor by slicing the appropriate indices which we do below
# In the TEST mode, we have no target available and the future_covariates can be directly used
# In the TRAIN and VALIDATION modes, the target and future_covariates need to be constructed from
# the context_tensor by slicing the appropriate indices which we do below
if self.mode in [DatasetMode.TRAIN, DatasetMode.VALIDATION]:
# the first task_n_targets elements in task_context_tensor are the targets
task_future_target = task_past_tensor[:, slice_idx : slice_idx + self.prediction_length]
# the first n_targets elements in context_tensor are the targets
future_target = past_tensor[:, slice_idx : slice_idx + self.prediction_length].clone()
# mask out all rows corresponding to covariates
future_target[n_targets:] = torch.nan
if task_n_future_covariates > 0:
# the last task_n_future_covariates elements in task_context_tensor are the known covariates
task_future_covariates = task_past_tensor[
-task_n_future_covariates:, slice_idx : slice_idx + self.prediction_length
if n_future_covariates > 0:
# the last n_future_covariates elements in context_tensor are the known covariates
future_covariates = past_tensor[
-n_future_covariates:, slice_idx : slice_idx + self.prediction_length
]
else:
# zero-length tensor for easy concatenation later
task_future_covariates = torch.zeros((0, self.prediction_length))
future_covariates = torch.zeros((0, self.prediction_length))
# the leading task_n_targets + task_n_past_only_covariates elements are masked because the target(s)
# the leading n_targets + n_past_only_covariates elements are masked because the target(s)
# and past-only covariates are not known into the future
task_future_covariates_padding = torch.full(
(task_n_targets + task_n_past_only_covariates, self.prediction_length),
future_covariates_padding = torch.full(
(n_targets + n_past_only_covariates, self.prediction_length),
fill_value=torch.nan,
)
task_future_covariates = torch.cat([task_future_covariates_padding, task_future_covariates], dim=0)
future_covariates = torch.cat([future_covariates_padding, future_covariates], dim=0)
else:
task_future_target = None
task_future_covariates = task_future_tensor
future_target = None
future_covariates = future_tensor
# task_context: (task_n_targets + task_n_covariates, min(context_length, history_length))
# task_future_target: (task_n_targets + task_n_covariates, prediction_length), the future values of known future covariates
# context: (n_targets + n_covariates, min(context_length, history_length))
# future_target: (n_targets + n_covariates, prediction_length), the future values of known future covariates
# are ignored during loss computation
# task_future_covariates: (task_n_targets + task_n_past_only_covariates + task_n_future_covariates, prediction_length),
# future_covariates: (n_targets + n_past_only_covariates + n_future_covariates, prediction_length),
# the entries corresponding to targets and past-only covariates are NaNs
return task_context, task_future_target, task_future_covariates, task_n_targets
return context, future_target, future_covariates, n_targets
def _build_batch(self, task_indices: list[int]) -> dict[str, torch.Tensor | int | list[tuple[int, int]] | None]:
"""Build a batch from given task indices."""
batch_context_tensor_list = []
batch_future_target_tensor_list = []
batch_future_covariates_tensor_list = []
def _build_batch(self, input_indices: list[int]) -> dict[str, torch.Tensor | int | list[tuple[int, int]] | None]:
"""Build a batch from given input indices."""
batch_context_list = []
batch_future_target_list = []
batch_future_covariates_list = []
batch_group_ids_list = []
target_idx_ranges: list[tuple[int, int]] = []
target_start_idx = 0
for group_id, task_idx in enumerate(task_indices):
task_context, task_future_target, task_future_covariates, task_n_targets = self._construct_slice(task_idx)
for group_id, input_idx in enumerate(input_indices):
context, future_target, future_covariates, n_targets = self._construct_slice(input_idx)
group_size = task_context.shape[0]
task_group_ids = torch.full((group_size,), fill_value=group_id)
batch_context_tensor_list.append(task_context)
batch_future_target_tensor_list.append(task_future_target)
batch_future_covariates_tensor_list.append(task_future_covariates)
batch_group_ids_list.append(task_group_ids)
target_idx_ranges.append((target_start_idx, target_start_idx + task_n_targets))
group_size = context.shape[0]
group_ids = torch.full((group_size,), fill_value=group_id)
batch_context_list.append(context)
batch_future_target_list.append(future_target)
batch_future_covariates_list.append(future_covariates)
batch_group_ids_list.append(group_ids)
target_idx_ranges.append((target_start_idx, target_start_idx + n_targets))
target_start_idx += group_size
return {
"context": left_pad_and_cat_2D(batch_context_tensor_list),
"context": left_pad_and_cat_2D(batch_context_list),
"future_target": None
if self.mode == DatasetMode.TEST
else torch.cat(cast(list[torch.Tensor], batch_future_target_tensor_list), dim=0),
"future_covariates": torch.cat(batch_future_covariates_tensor_list, dim=0),
else torch.cat(cast(list[torch.Tensor], batch_future_target_list), dim=0),
"future_covariates": torch.cat(batch_future_covariates_list, dim=0),
"group_ids": torch.cat(batch_group_ids_list, dim=0),
"num_output_patches": self.num_output_patches,
"target_idx_ranges": target_idx_ranges,
@ -860,27 +641,27 @@ class Chronos2Dataset(IterableDataset):
def _generate_train_batches(self):
while True:
current_batch_size = 0
task_indices = []
input_indices = []
while current_batch_size < self.batch_size:
task_idx = np.random.randint(len(self.tasks))
task_indices.append(task_idx)
current_batch_size += self.tasks[task_idx][0].shape[0]
input_idx = np.random.randint(len(self.inputs))
input_indices.append(input_idx)
current_batch_size += self.inputs[input_idx]["context"].shape[0]
yield self._build_batch(task_indices)
yield self._build_batch(input_indices)
def _generate_sequential_batches(self):
task_idx = 0
while task_idx < len(self.tasks):
input_idx = 0
while input_idx < len(self.inputs):
current_batch_size = 0
task_indices = []
input_indices = []
while task_idx < len(self.tasks) and current_batch_size < self.batch_size:
task_indices.append(task_idx)
current_batch_size += self.tasks[task_idx][0].shape[0]
task_idx += 1
while input_idx < len(self.inputs) and current_batch_size < self.batch_size:
input_indices.append(input_idx)
current_batch_size += self.inputs[input_idx]["context"].shape[0]
input_idx += 1
yield self._build_batch(task_indices)
yield self._build_batch(input_indices)
def __iter__(self) -> Iterator:
"""
@ -907,39 +688,3 @@ class Chronos2Dataset(IterableDataset):
yield batch
else:
yield from self._generate_sequential_batches()
@classmethod
def convert_inputs(
cls,
inputs: TensorOrArray
| Sequence[TensorOrArray]
| Sequence[Mapping[str, TensorOrArray | Mapping[str, TensorOrArray | None]]],
context_length: int,
prediction_length: int,
batch_size: int,
output_patch_size: int,
min_past: int = 1,
mode: str | DatasetMode = DatasetMode.TRAIN,
) -> "Chronos2Dataset":
"""Convert from different input formats to a Chronos2Dataset."""
if isinstance(inputs, (torch.Tensor, np.ndarray)):
inputs = convert_tensor_input_to_list_of_dicts_input(inputs)
elif isinstance(inputs, list) and all([isinstance(x, (torch.Tensor, np.ndarray)) for x in inputs]):
inputs = cast(list[TensorOrArray], inputs)
inputs = convert_list_of_tensors_input_to_list_of_dicts_input(inputs)
elif isinstance(inputs, list) and all([isinstance(x, dict) for x in inputs]):
pass
else:
raise ValueError("Unexpected inputs format")
inputs = cast(list[dict[str, TensorOrArray | dict[str, TensorOrArray]]], inputs)
return cls(
inputs,
context_length=context_length,
prediction_length=prediction_length,
batch_size=batch_size,
output_patch_size=output_patch_size,
min_past=min_past,
mode=mode,
)

View file

@ -155,6 +155,7 @@ class MHA(nn.Module):
self.n_heads: int = config.num_heads
self.dropout: float = config.dropout_rate
self.inner_dim: int = self.n_heads * self.kv_proj_dim
self.config = config
self.q = nn.Linear(self.d_model, self.inner_dim, bias=False)
self.k = nn.Linear(self.d_model, self.inner_dim, bias=False)
@ -165,6 +166,64 @@ class MHA(nn.Module):
if use_rope:
self.rope_embed = RoPE(dim=self.kv_proj_dim, base=config.rope_theta)
def _eager_attention(
self,
query_states: torch.Tensor,
key_states: torch.Tensor,
value_states: torch.Tensor,
mask: torch.Tensor,
) -> tuple[torch.Tensor, torch.Tensor]:
"""Eager attention implementation using manual matmul.
Args:
query_states: [batch, n_heads, seq_len, kv_proj_dim]
key_states: [batch, n_heads, seq_len, kv_proj_dim]
value_states: [batch, n_heads, seq_len, kv_proj_dim]
mask: [batch, n_heads, q_len, kv_len]
Returns:
attn_output: [batch, n_heads, seq_len, kv_proj_dim]
attn_weights: [batch, n_heads, q_len, kv_len]
"""
# Compute attention weights (no scaling - this is the original Chronos-2 implementation)
scores = torch.matmul(query_states, key_states.transpose(3, 2)) # "bnqd,bnkd->bnqk"
scores += mask
attn_weights = nn.functional.softmax(scores.float(), dim=-1).type_as(scores)
attn_weights = nn.functional.dropout(attn_weights, p=self.dropout, training=self.training)
attn_output = torch.matmul(attn_weights, value_states)
return attn_output, attn_weights
def _sdpa_attention(
self,
query_states: torch.Tensor,
key_states: torch.Tensor,
value_states: torch.Tensor,
mask: torch.Tensor,
) -> tuple[torch.Tensor, None]:
"""SDPA attention implementation using torch.nn.functional.scaled_dot_product_attention.
Args:
query_states: [batch, n_heads, seq_len, kv_proj_dim]
key_states: [batch, n_heads, seq_len, kv_proj_dim]
value_states: [batch, n_heads, seq_len, kv_proj_dim]
mask: [batch, n_heads, q_len, kv_len] - additive mask (0 for valid, -inf for invalid)
Returns:
attn_output: [batch, n_heads, seq_len, kv_proj_dim]
attn_weights: None (SDPA doesn't return weights)
"""
attn_output = nn.functional.scaled_dot_product_attention(
query_states,
key_states,
value_states,
attn_mask=mask,
dropout_p=self.dropout if self.training else 0.0,
scale=1.0, # Match eager implementation (no scaling)
)
return attn_output, None
def forward(
self,
hidden_states: torch.Tensor,
@ -190,6 +249,11 @@ class MHA(nn.Module):
if self.use_rope:
assert position_ids is not None, "position_ids must be provided when self.use_rope=True"
# Force eager attention if output_attentions is True (only eager returns weights)
attn_implementation = self.config._attn_implementation
if output_attentions:
attn_implementation = "eager"
seq_length = hidden_states.shape[1]
def shape(states: torch.Tensor) -> torch.Tensor:
@ -215,12 +279,10 @@ class MHA(nn.Module):
cos, sin = self.rope_embed(value_states, position_ids)
query_states, key_states = RoPE.apply_rotary_pos_emb(query_states, key_states, cos, sin)
# Compute attention weights
scores = torch.matmul(query_states, key_states.transpose(3, 2)) # "bnqd,bnkd->bnqk"
scores += mask
attn_weights = nn.functional.softmax(scores.float(), dim=-1).type_as(scores)
attn_weights = nn.functional.dropout(attn_weights, p=self.dropout, training=self.training)
attn_output = torch.matmul(attn_weights, value_states)
if attn_implementation == "sdpa":
attn_output, attn_weights = self._sdpa_attention(query_states, key_states, value_states, mask)
else: # eager
attn_output, attn_weights = self._eager_attention(query_states, key_states, value_states, mask)
# Project attention output
attn_output = unshape(attn_output)

View file

@ -199,6 +199,7 @@ class Chronos2Model(PreTrainedModel):
config_class = Chronos2CoreConfig # type: ignore[assignment]
_supports_long_horizon: bool = True
_supports_future_covariates: bool = True
_supports_sdpa: bool = True
def __init__(self, config: Chronos2CoreConfig):
assert hasattr(config, "chronos_config"), "Not a valid Chronos config"
@ -546,6 +547,74 @@ class Chronos2Model(PreTrainedModel):
return loss
def encode(
self,
context: torch.Tensor,
context_mask: torch.Tensor | None = None,
group_ids: torch.Tensor | None = None,
future_covariates: torch.Tensor | None = None,
future_covariates_mask: torch.Tensor | None = None,
num_output_patches: int = 1,
future_target: torch.Tensor | None = None,
future_target_mask: torch.Tensor | None = None,
output_attentions: bool = False,
):
self._validate_input(
context=context,
context_mask=context_mask,
future_covariates=future_covariates,
future_covariates_mask=future_covariates_mask,
group_ids=group_ids,
num_output_patches=num_output_patches,
future_target=future_target,
future_target_mask=future_target_mask,
)
batch_size = context.shape[0]
patched_context, attention_mask, loc_scale = self._prepare_patched_context(
context=context, context_mask=context_mask
)
num_context_patches = attention_mask.shape[-1]
# get input embeddings of shape (batch, num_context_patches, d_model)
input_embeds: torch.Tensor = self.input_patch_embedding(patched_context)
# append [REG] special token embedding, if needed
if self.chronos_config.use_reg_token:
reg_input_ids = torch.full((batch_size, 1), self.config.reg_token_id, device=input_embeds.device)
reg_embeds = self.shared(reg_input_ids)
input_embeds = torch.cat([input_embeds, reg_embeds], dim=-2)
attention_mask = torch.cat(
[attention_mask.to(self.dtype), torch.ones_like(reg_input_ids).to(self.dtype)], dim=-1
)
patched_future, patched_future_covariates_mask = self._prepare_patched_future(
future_covariates=future_covariates,
future_covariates_mask=future_covariates_mask,
loc_scale=loc_scale,
num_output_patches=num_output_patches,
batch_size=batch_size,
)
future_attention_mask = torch.ones(batch_size, num_output_patches, dtype=self.dtype, device=self.device)
# get future embeddings of shape (batch, num_output_patches, d_model)
future_embeds: torch.Tensor = self.input_patch_embedding(patched_future)
# concatenate context and future embeddings and masks
input_embeds = torch.cat([input_embeds, future_embeds], dim=-2)
attention_mask = torch.cat([attention_mask, future_attention_mask], dim=-1)
if group_ids is None:
# by default, each time series is treated independently, i.e., no mixing across the batch
group_ids = torch.arange(batch_size, dtype=torch.long, device=self.device)
encoder_outputs: Chronos2EncoderOutput = self.encoder(
attention_mask=attention_mask,
inputs_embeds=input_embeds,
group_ids=group_ids,
output_attentions=output_attentions,
)
return encoder_outputs, loc_scale, patched_future_covariates_mask, num_context_patches
def forward(
self,
context: torch.Tensor,
@ -624,63 +693,19 @@ class Chronos2Model(PreTrainedModel):
- enc_time_self_attn_weights: Time self attention weights, if output_attentions=True
- enc_group_self_attn_weights: Group self attention weights, if output_attentions=True
"""
self._validate_input(
batch_size = context.shape[0]
encoder_outputs, loc_scale, patched_future_covariates_mask, num_context_patches = self.encode(
context=context,
context_mask=context_mask,
group_ids=group_ids,
future_covariates=future_covariates,
future_covariates_mask=future_covariates_mask,
group_ids=group_ids,
num_output_patches=num_output_patches,
future_target=future_target,
future_target_mask=future_target_mask,
)
batch_size = context.shape[0]
patched_context, attention_mask, loc_scale = self._prepare_patched_context(
context=context, context_mask=context_mask
)
num_context_patches = attention_mask.shape[-1]
# get input embeddings of shape (batch, num_context_patches, d_model)
input_embeds: torch.Tensor = self.input_patch_embedding(patched_context)
# append [REG] special token embedding, if needed
if self.chronos_config.use_reg_token:
reg_input_ids = torch.full((batch_size, 1), self.config.reg_token_id, device=input_embeds.device)
reg_embeds = self.shared(reg_input_ids)
input_embeds = torch.cat([input_embeds, reg_embeds], dim=-2)
attention_mask = torch.cat(
[attention_mask.to(self.dtype), torch.ones_like(reg_input_ids).to(self.dtype)], dim=-1
)
patched_future, patched_future_covariates_mask = self._prepare_patched_future(
future_covariates=future_covariates,
future_covariates_mask=future_covariates_mask,
loc_scale=loc_scale,
num_output_patches=num_output_patches,
batch_size=batch_size,
)
future_attention_mask = torch.ones(batch_size, num_output_patches, dtype=self.dtype, device=self.device)
# get future embeddings of shape (batch, num_output_patches, d_model)
future_embeds: torch.Tensor = self.input_patch_embedding(patched_future)
# concatenate context and future embeddings and masks
input_embeds = torch.cat([input_embeds, future_embeds], dim=-2)
attention_mask = torch.cat([attention_mask, future_attention_mask], dim=-1)
if group_ids is None:
# by default, each time series is treated independently, i.e., no mixing across the batch
group_ids = torch.arange(batch_size, dtype=torch.long, device=self.device)
encoder_outputs: Chronos2EncoderOutput = self.encoder(
attention_mask=attention_mask,
inputs_embeds=input_embeds,
group_ids=group_ids,
output_attentions=output_attentions,
)
hidden_states: torch.Tensor = encoder_outputs[0]
assert hidden_states.shape == (batch_size, num_context_patches + 1 + num_output_patches, self.model_dim)
# slice the last num_output_patches hidden states to be input into the output_patch_embedding

View file

@ -9,29 +9,29 @@ import time
import warnings
from copy import deepcopy
from pathlib import Path
from typing import TYPE_CHECKING, Any, Mapping, Sequence
from typing import TYPE_CHECKING, Callable, Literal, Mapping, Sequence
import numpy as np
import torch
from einops import rearrange, repeat
from torch.utils.data import DataLoader
from transformers import AutoConfig
from transformers.utils.import_utils import is_peft_available
from transformers.utils.peft_utils import find_adapter_config_file
import chronos.chronos2
from chronos.base import BaseChronosPipeline, ForecastType
from chronos.chronos2 import Chronos2Model
from chronos.chronos2.dataset import (
Chronos2Dataset,
DatasetMode,
TensorOrArray,
convert_df_input_to_list_of_dicts_input,
)
from chronos.chronos2.dataset import Chronos2Dataset, DatasetMode, TensorOrArray
from chronos.df_utils import convert_df_input_to_list_of_dicts_input
from chronos.utils import interpolate_quantiles, weighted_quantile
if TYPE_CHECKING:
import datasets
import fev
import pandas as pd
from peft import LoraConfig
from transformers.trainer_callback import TrainerCallback
logger = logging.getLogger(__name__)
@ -103,13 +103,19 @@ class Chronos2Pipeline(BaseChronosPipeline):
| Sequence[TensorOrArray]
| Sequence[Mapping[str, TensorOrArray | Mapping[str, TensorOrArray | None]]]
| None = None,
finetune_mode: Literal["full", "lora"] = "full",
lora_config: "LoraConfig | dict | None" = None,
context_length: int | None = None,
learning_rate: float = 1e-5,
learning_rate: float = 1e-6,
num_steps: int = 1000,
batch_size: int = 256,
output_dir: Path | str | None = None,
min_past: int | None = None,
finetuned_ckpt_name: str = "finetuned-ckpt",
callbacks: list["TrainerCallback"] | None = None,
remove_printer_callback: bool = False,
disable_data_parallel: bool = True,
convert_inputs: bool = True,
**extra_trainer_kwargs,
) -> "Chronos2Pipeline":
"""
@ -127,10 +133,16 @@ class Chronos2Pipeline(BaseChronosPipeline):
validation_inputs
The time series used for validation and model selection. The format of `validation_inputs` is exactly the same as `inputs`, by default None which
means that no validation is performed. Note that enabling validation may slow down fine-tuning for large datasets.
finetune_mode
One of "full" (performs full fine-tuning) or "lora" (performs Low Rank Adaptation (LoRA) fine-tuning), by default "full"
lora_config
The configuration to use for LoRA fine-tuning when finetune_mode="lora". Can be a `LoraConfig` object or a dict which is used to initialize `LoraConfig`.
When unspecified and finetune_mode="lora", a default configuration is used
context_length
The maximum context length used during fine-tuning, by default set to the model's default context length
learning_rate
The learning rate for the optimizer, by default 1e-5
The learning rate for the optimizer, by default 1e-6
When finetune_mode="lora", we recommend using a higher value of the learning rate, such as 1e-5
num_steps
The number of steps to fine-tune for, by default 1000
batch_size
@ -144,6 +156,16 @@ class Chronos2Pipeline(BaseChronosPipeline):
are filtered out, by default set equal to prediction_length
finetuned_ckpt_name
The name of the directory inside `output_dir` in which the final fine-tuned checkpoint will be saved, by default "finetuned-ckpt"
callbacks
A list of `TrainerCallback`s which will be forwarded to the HuggingFace `Trainer`
remove_printer_callback
If True, all instances of `PrinterCallback` are removed from callbacks
disable_data_parallel
If True, ensures that DataParallel is disabled and training happens on a single GPU
convert_inputs
If True (default), preprocess raw inputs (convert tensors, encode categoricals, validate).
If False, inputs are expected to be already preprocessed using `chronos.chronos2.dataset.prepare_inputs`.
This allows for efficient training on large datasets that don't fit in memory.
**extra_trainer_kwargs
Extra kwargs are directly forwarded to `TrainingArguments`
@ -153,28 +175,66 @@ class Chronos2Pipeline(BaseChronosPipeline):
"""
import torch.cuda
from transformers.trainer_callback import PrinterCallback
from transformers.training_args import TrainingArguments
if finetune_mode == "lora":
if is_peft_available():
from peft import LoraConfig, get_peft_model
else:
warnings.warn(
"`peft` is required for `finetune_mode='lora'`. Please install it with `pip install peft`. Falling back to `finetune_mode='full'`."
)
finetune_mode = "full"
lora_config = None
from chronos.chronos2.trainer import Chronos2Trainer, EvaluateAndSaveFinalStepCallback
warnings.warn(
"Fine-tuning support is experimental and may be changed in future versions.",
category=FutureWarning,
stacklevel=2,
)
assert finetune_mode in ["full", "lora"], f"finetune_mode must be one of ['full', 'lora'], got {finetune_mode}"
if finetune_mode == "full" and lora_config is not None:
raise ValueError(
"lora_config should not be specified when `finetune_mode='full'`. To enable LoRA, set `finetune_mode='lora'`."
)
# Create a copy of the model to avoid modifying the original
config = deepcopy(self.model.config)
model = Chronos2Model(config).to(self.model.device) # type: ignore
model.load_state_dict(self.model.state_dict())
if finetune_mode == "lora":
if lora_config is None:
lora_config = LoraConfig(
r=8,
lora_alpha=16,
target_modules=[
"self_attention.q",
"self_attention.v",
"self_attention.k",
"self_attention.o",
"output_patch_embedding.output_layer",
],
)
elif isinstance(lora_config, dict):
lora_config = LoraConfig(**lora_config)
else:
assert isinstance(lora_config, LoraConfig), (
f"lora_config must be an instance of LoraConfig or a dict, got {type(lora_config)}"
)
model = get_peft_model(model, lora_config)
n_trainable_params, n_params = model.get_nb_trainable_parameters()
logger.info(
f"Using LoRA. Number of trainable parameters: {n_trainable_params}, total parameters: {n_params}."
)
if context_length is None:
context_length = self.model_context_length
if min_past is None:
min_past = prediction_length
train_dataset = Chronos2Dataset.convert_inputs(
train_dataset = Chronos2Dataset(
inputs=inputs,
context_length=context_length,
prediction_length=prediction_length,
@ -182,6 +242,7 @@ class Chronos2Pipeline(BaseChronosPipeline):
output_patch_size=self.model_output_patch_size,
min_past=min_past,
mode=DatasetMode.TRAIN,
convert_inputs=convert_inputs,
)
if output_dir is None:
@ -211,14 +272,13 @@ class Chronos2Pipeline(BaseChronosPipeline):
lr_scheduler_type="linear",
warmup_ratio=0.0,
optim="adamw_torch_fused",
logging_dir=str(output_dir / "logs"),
logging_strategy="steps",
logging_steps=100,
disable_tqdm=False,
report_to="none",
max_steps=num_steps,
gradient_accumulation_steps=1,
dataloader_num_workers=1,
dataloader_num_workers=0,
tf32=has_sm80 and not use_cpu,
bf16=has_sm80 and not use_cpu,
save_only_model=True,
@ -234,16 +294,16 @@ class Chronos2Pipeline(BaseChronosPipeline):
)
eval_dataset = None
callbacks = []
callbacks = callbacks or []
if validation_inputs is not None:
# construct validation dataset
eval_dataset = Chronos2Dataset.convert_inputs(
eval_dataset = Chronos2Dataset(
inputs=validation_inputs,
context_length=context_length,
prediction_length=prediction_length,
batch_size=batch_size,
output_patch_size=self.model_output_patch_size,
mode=DatasetMode.VALIDATION,
convert_inputs=convert_inputs,
)
# set validation parameters
@ -268,6 +328,11 @@ class Chronos2Pipeline(BaseChronosPipeline):
training_args = TrainingArguments(**training_kwargs)
if disable_data_parallel and not use_cpu:
# This is a hack to disable the default `transformers` behavior of using DataParallel
training_args._n_gpu = 1
assert training_args.n_gpu == 1 # Ensure that the hack worked
trainer = Chronos2Trainer(
model=model,
args=training_args,
@ -275,12 +340,19 @@ class Chronos2Pipeline(BaseChronosPipeline):
eval_dataset=eval_dataset,
callbacks=callbacks,
)
if remove_printer_callback:
trainer.pop_callback(PrinterCallback)
trainer.train()
# update max_output_patches, if the model was fine-tuned with longer prediction_length
# update context_length and max_output_patches, if the model was fine-tuned with larger values
model.chronos_config.context_length = max(model.chronos_config.context_length, context_length)
model.chronos_config.max_output_patches = max(
model.chronos_config.max_output_patches, math.ceil(prediction_length / self.model_output_patch_size)
)
# update chronos_config in model's config, so it is saved correctly
model.config.chronos_config = model.chronos_config.__dict__
# Create a new pipeline with the fine-tuned model
finetuned_pipeline = Chronos2Pipeline(model=model)
@ -389,7 +461,7 @@ class Chronos2Pipeline(BaseChronosPipeline):
prediction_length: int | None = None,
batch_size: int = 256,
context_length: int | None = None,
predict_batches_jointly: bool = False,
cross_learning: bool = False,
limit_prediction_length: bool = False,
**kwargs,
) -> list[torch.Tensor]:
@ -475,7 +547,7 @@ class Chronos2Pipeline(BaseChronosPipeline):
will be lower than this value, by default 256
context_length
The maximum context length used during for inference, by default set to the model's default context length
predict_batches_jointly
cross_learning
If True, cross-learning is enabled, i.e., all the tasks in `inputs` will be predicted jointly and the model will share information across all inputs, by default False
The following must be noted when using cross-learning:
- Cross-learning doesn't always improve forecast accuracy and must be tested for individual use cases.
@ -495,6 +567,14 @@ class Chronos2Pipeline(BaseChronosPipeline):
if prediction_length is None:
prediction_length = model_prediction_length
if kwargs.get("predict_batches_jointly") is not None:
warnings.warn(
"The `predict_batches_jointly` argument is deprecated and will be removed in a future version. "
"Please use `cross_learning=True` to enable the cross-learning mode.",
category=FutureWarning,
stacklevel=2,
)
cross_learning = kwargs.pop("predict_batches_jointly")
# The maximum number of output patches to generate in a single forward pass before the long-horizon heuristic kicks in. Note: A value larger
# than the model's default max_output_patches may lead to degradation in forecast accuracy, defaults to a model-specific value
max_output_patches = kwargs.pop("max_output_patches", self.max_output_patches)
@ -503,6 +583,8 @@ class Chronos2Pipeline(BaseChronosPipeline):
# effective batch size increases by a factor of `len(unrolled_quantiles)` when making long-horizon predictions,
# by default [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
unrolled_quantiles = kwargs.pop("unrolled_quantiles", [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
# A callback which is called after each batch has been processed
after_batch_callback: Callable = kwargs.pop("after_batch", lambda: None)
if len(kwargs) > 0:
raise TypeError(f"Unexpected keyword arguments: {list(kwargs.keys())}.")
@ -534,8 +616,8 @@ class Chronos2Pipeline(BaseChronosPipeline):
)
context_length = self.model_context_length
test_dataset = Chronos2Dataset.convert_inputs(
inputs=inputs,
test_dataset = Chronos2Dataset(
inputs,
context_length=context_length,
prediction_length=prediction_length,
batch_size=batch_size,
@ -543,7 +625,7 @@ class Chronos2Pipeline(BaseChronosPipeline):
mode=DatasetMode.TEST,
)
test_loader = DataLoader(
test_dataset, batch_size=None, num_workers=1, pin_memory=True, shuffle=False, drop_last=False
test_dataset, batch_size=None, pin_memory=self.model.device.type == "cuda", shuffle=False, drop_last=False
)
all_predictions: list[torch.Tensor] = []
@ -554,7 +636,7 @@ class Chronos2Pipeline(BaseChronosPipeline):
batch_future_covariates = batch["future_covariates"]
batch_target_idx_ranges = batch["target_idx_ranges"]
if predict_batches_jointly:
if cross_learning:
batch_group_ids = torch.zeros_like(batch_group_ids)
batch_prediction = self._predict_batch(
@ -567,6 +649,7 @@ class Chronos2Pipeline(BaseChronosPipeline):
target_idx_ranges=batch_target_idx_ranges,
)
all_predictions.extend(batch_prediction)
after_batch_callback()
return all_predictions
@ -745,6 +828,10 @@ class Chronos2Pipeline(BaseChronosPipeline):
prediction_length: int | None = None,
quantile_levels: list[float] = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
batch_size: int = 256,
context_length: int | None = None,
cross_learning: bool = False,
validate_inputs: bool = True,
freq: str | None = None,
**predict_kwargs,
) -> "pd.DataFrame":
"""
@ -774,6 +861,24 @@ class Chronos2Pipeline(BaseChronosPipeline):
The batch size used for prediction. Note that the batch size here means the number of time series, including target(s) and covariates,
which are input into the model. If your data has multiple target and/or covariates, the effective number of time series tasks in a batch
will be lower than this value, by default 256
context_length
The maximum context length used during for inference, by default set to the model's default context length
cross_learning
If True, cross-learning is enabled, i.e., all the tasks in `inputs` will be predicted jointly and the model will share information across all inputs, by default False
The following must be noted when using cross-learning:
- Cross-learning doesn't always improve forecast accuracy and must be tested for individual use cases.
- Results become dependent on batch size. Very large batch sizes may not provide benefits as they deviate from the maximum group size used during pretraining.
For optimal results, consider using a batch size around 100 (as used in the Chronos-2 technical report).
- Cross-learning is most helpful when individual time series have limited historical context, as the model can leverage patterns from related series in the batch.
validate_inputs
[ADVANCED] When True (default), validates dataframes before prediction. Setting to False removes the
validation overhead, but may silently lead to wrong predictions if data is misformatted. When False, you
must ensure: (1) all dataframes are sorted by (id_column, timestamp_column); (2) future_df (if provided)
has the same item IDs as df with exactly prediction_length rows of future timestamps per item; (3) all
timestamps are regularly spaced (e.g., with hourly frequency).
freq
Frequency string for timestamp generation (e.g., "h", "D", "W"). Can only be used when
validate_inputs=False. When provided, skips frequency inference from the data.
**predict_kwargs
Additional arguments passed to predict_quantiles
@ -804,6 +909,8 @@ class Chronos2Pipeline(BaseChronosPipeline):
timestamp_column=timestamp_column,
target_columns=target,
prediction_length=prediction_length,
freq=freq,
validate_inputs=validate_inputs,
)
# Generate forecasts
@ -813,33 +920,37 @@ class Chronos2Pipeline(BaseChronosPipeline):
quantile_levels=quantile_levels,
limit_prediction_length=False,
batch_size=batch_size,
context_length=context_length,
cross_learning=cross_learning,
**predict_kwargs,
)
# since predict_df tasks are homogenous by input design, we can safely stack the list of tensors into a single tensor
quantiles_np = torch.stack(quantiles).numpy() # [n_tasks, n_variates, horizon, num_quantiles]
mean_np = torch.stack(mean).numpy() # [n_tasks, n_variates, horizon]
results_dfs = []
for i, (series_id, future_ts) in enumerate(prediction_timestamps.items()):
q_pred = quantiles_np[i] # (n_variates, prediction_length, len(quantile_levels))
point_pred = mean_np[i] # (n_variates, prediction_length)
n_tasks = len(prediction_timestamps)
n_variates = len(target)
for target_idx, target_col in enumerate(target):
series_forecast_data: dict[str | tuple[str, str], Any] = {
id_column: series_id,
timestamp_column: future_ts,
"target_name": target_col,
}
series_forecast_data["predictions"] = point_pred[target_idx]
for q_idx, q_level in enumerate(quantile_levels):
series_forecast_data[str(q_level)] = q_pred[target_idx, :, q_idx]
series_ids = list(prediction_timestamps.keys())
future_ts = list(prediction_timestamps.values())
results_dfs.append(pd.DataFrame(series_forecast_data))
data = {
id_column: np.repeat(series_ids, n_variates * prediction_length),
timestamp_column: np.concatenate([np.tile(ts, n_variates) for ts in future_ts]),
"target_name": np.tile(np.repeat(target, prediction_length), n_tasks),
"predictions": mean_np.ravel(),
}
predictions_df = pd.concat(results_dfs, ignore_index=True)
predictions_df.set_index(id_column, inplace=True)
predictions_df = predictions_df.loc[original_order]
predictions_df.reset_index(inplace=True)
quantiles_flat = quantiles_np.reshape(-1, len(quantile_levels))
for q_idx, q_level in enumerate(quantile_levels):
data[str(q_level)] = quantiles_flat[:, q_idx]
predictions_df = pd.DataFrame(data)
# If validate_inputs=False, the df is used as-is without sorting by item_id, no reordering required
if validate_inputs:
predictions_df.set_index(id_column, inplace=True)
predictions_df = predictions_df.loc[original_order]
predictions_df.reset_index(inplace=True)
return predictions_df
@ -974,11 +1085,7 @@ class Chronos2Pipeline(BaseChronosPipeline):
finetune_kwargs["prediction_length"] = first_window.horizon
finetune_kwargs["batch_size"] = finetune_kwargs.get("batch_size", batch_size)
try:
pipeline = self.fit(inputs=inputs, **finetune_kwargs)
except Exception as e:
msg = f"Finetuning failed with error: {e}. Continuing with the pretrained model."
warnings.warn(msg, category=UserWarning, stacklevel=2)
pipeline = self.fit(inputs=inputs, **finetune_kwargs)
predictions_per_window = []
inference_time_s = 0.0
@ -995,6 +1102,86 @@ class Chronos2Pipeline(BaseChronosPipeline):
return predictions_per_window, inference_time_s
@torch.no_grad()
def embed(
self, inputs: TensorOrArray | Sequence[TensorOrArray], batch_size: int = 256, context_length: int | None = None
) -> tuple[list[torch.Tensor], list[tuple[torch.Tensor, torch.Tensor]]]:
"""
Get encoder embeddings for the given time series.
Parameters
----------
inputs
The time series to get embeddings for, can be one of:
- A 3-dimensional `torch.Tensor` or `np.ndarray` of shape (batch, n_variates, history_length). When `n_variates > 1`, information
will be shared among the different variates of each time series in the batch.
- A list of `torch.Tensor` or `np.ndarray` where each element can either be 1-dimensional of shape (history_length,)
or 2-dimensional of shape (n_variates, history_length). The history_lengths may be different across elements; left-padding
will be applied, if needed.
batch_size
The batch size used for generating embeddings. Note that the batch size here means the total number of time series which are input into the model.
If your data has multiple variates, the effective number of time series tasks in a batch will be lower than this value, by default 256
context_length
The maximum context length used during for inference, by default set to the model's default context length
Returns
-------
embeddings
a list of `torch.Tensor` where each element has shape (n_variates, num_patches + 2, d_model) and the number of elements are equal to the number
of target time series (univariate or multivariate) in the `inputs`. The extra +2 is due to embeddings of the [REG] token and a masked output patch token.
loc_scale
a list of tuples with the mean and standard deviation of each time series.
"""
if context_length is None:
context_length = self.model_context_length
if context_length > self.model_context_length:
warnings.warn(
f"The specified context_length {context_length} is greater than the model's default context length {self.model_context_length}. "
f"Resetting context_length to {self.model_context_length}."
)
context_length = self.model_context_length
test_dataset = Chronos2Dataset(
inputs,
context_length=context_length,
prediction_length=0,
batch_size=batch_size,
output_patch_size=self.model_output_patch_size,
mode=DatasetMode.TEST,
)
test_loader = DataLoader(
test_dataset,
batch_size=None,
num_workers=0,
pin_memory=self.model.device.type == "cuda",
shuffle=False,
drop_last=False,
)
all_embeds: list[torch.Tensor] = []
all_loc_scales: list[tuple[torch.Tensor, torch.Tensor]] = []
for batch in test_loader:
assert batch["future_target"] is None
batch_context = batch["context"]
batch_group_ids = batch["group_ids"]
batch_target_idx_ranges = batch["target_idx_ranges"]
encoder_outputs, (locs, scales), *_ = self.model.encode(
context=batch_context.to(device=self.model.device, dtype=torch.float32),
group_ids=batch_group_ids.to(self.model.device),
)
batch_embeds = [encoder_outputs[0][start:end].cpu() for (start, end) in batch_target_idx_ranges]
batch_loc_scales = list(
zip(
[locs[start:end].cpu() for (start, end) in batch_target_idx_ranges],
[scales[start:end].cpu() for (start, end) in batch_target_idx_ranges],
)
)
all_embeds.extend(batch_embeds)
all_loc_scales.extend(batch_loc_scales)
return all_embeds, all_loc_scales
@classmethod
def from_pretrained(cls, pretrained_model_name_or_path, *args, **kwargs):
"""
@ -1002,9 +1189,25 @@ class Chronos2Pipeline(BaseChronosPipeline):
Supports the same arguments as ``AutoConfig`` and ``AutoModel`` from ``transformers``.
"""
# Check if the model is on S3 and cache it locally first
# NOTE: Only base models (not LoRA adapters) are supported via S3
if str(pretrained_model_name_or_path).startswith("s3://"):
return BaseChronosPipeline.from_pretrained(pretrained_model_name_or_path, *args, **kwargs)
# Check if the hub model_id or local path is a LoRA adapter
if find_adapter_config_file(pretrained_model_name_or_path) is not None:
if not is_peft_available():
raise ImportError(
f"The model at {pretrained_model_name_or_path} is a `peft` adaptor, but `peft` is not available. "
f"Please install `peft` with `pip install peft` to use this model. "
)
from peft import AutoPeftModel
model = AutoPeftModel.from_pretrained(pretrained_model_name_or_path, *args, **kwargs)
model = model.merge_and_unload()
return cls(model=model)
# Handle the case for the base model
config = AutoConfig.from_pretrained(pretrained_model_name_or_path, *args, **kwargs)
assert hasattr(config, "chronos_config"), "Not a Chronos config file"

View file

@ -3,6 +3,7 @@
# Authors: Abdul Fatir Ansari <ansarnd@amazon.com>
import warnings
from typing import TYPE_CHECKING, cast
from torch.utils.data import DataLoader, Dataset
@ -48,11 +49,16 @@ class Chronos2Trainer(Trainer):
train_dataset = cast("Chronos2Dataset", self.train_dataset)
assert train_dataset.batch_size == self.args.train_batch_size, (
f"The batch_size of the train_dataset ({train_dataset.batch_size}) does not match the batch_size "
f"in TrainingArguments ({self.args.train_batch_size}). If you're using a machine with multiple GPUs, "
f"ensure that only a single GPU is visible by setting the CUDA_VISIBLE_DEVICES environment variable."
)
if self.args.train_batch_size > train_dataset.batch_size:
warnings.warn(
f"The batch_size of the train_dataset ({train_dataset.batch_size}) does not match the batch_size "
f"in TrainingArguments ({self.args.train_batch_size}). On machines with multiple GPUs, this may indicate "
f"that multiple GPUs are visible and transformers is using DataParallel for training by default. "
f"This may lead to unnecessary slowdown and unexpected behavior. We strongly recommend setting the CUDA_VISIBLE_DEVICES "
f"environment variable to ensure that only a single GPU is visible.",
category=UserWarning,
stacklevel=3,
)
dataloader_params = {
# Disable automatic batching as we handle batching ourselves
@ -74,11 +80,16 @@ class Chronos2Trainer(Trainer):
eval_dataset = cast("Chronos2Dataset", self.eval_dataset)
assert eval_dataset.batch_size == self.args.eval_batch_size, (
f"The batch_size of the eval_dataset ({eval_dataset.batch_size}) does not match the batch_size "
f"in TrainingArguments ({self.args.eval_batch_size}). If you're using a machine with multiple GPUs, "
f"ensure that only a single GPU is visible by setting the CUDA_VISIBLE_DEVICES environment variable."
)
if self.args.eval_batch_size > eval_dataset.batch_size:
warnings.warn(
f"The batch_size of the eval_dataset ({eval_dataset.batch_size}) does not match the batch_size "
f"in TrainingArguments ({self.args.eval_batch_size}). On machines with multiple GPUs, this may indicate "
f"that multiple GPUs are visible and transformers is using DataParallel for training by default. "
f"This may lead to unnecessary slowdown and unexpected behavior. We strongly recommend setting the CUDA_VISIBLE_DEVICES "
f"environment variable to ensure that only a single GPU is visible.",
category=UserWarning,
stacklevel=3,
)
dataloader_params = {
# Disable automatic batching as we handle batching ourselves

View file

@ -408,6 +408,14 @@ class ChronosBoltPipeline(BaseChronosPipeline):
super().__init__(inner_model=model) # type: ignore
self.model = model
@property
def model_context_length(self) -> int:
return self.model.chronos_config.context_length
@property
def model_prediction_length(self) -> int:
return self.model.chronos_config.prediction_length
@property
def quantiles(self) -> List[float]:
return self.model.config.chronos_config["quantiles"]
@ -487,14 +495,12 @@ class ChronosBoltPipeline(BaseChronosPipeline):
"""
context_tensor = self._prepare_and_validate_context(context=inputs)
model_context_length: int = self.model.config.chronos_config["context_length"]
model_prediction_length: int = self.model.config.chronos_config["prediction_length"]
if prediction_length is None:
prediction_length = model_prediction_length
prediction_length = self.model_prediction_length
if prediction_length > model_prediction_length:
if prediction_length > self.model_prediction_length:
msg = (
f"We recommend keeping prediction length <= {model_prediction_length}. "
f"We recommend keeping prediction length <= {self.model_prediction_length}. "
"The quality of longer predictions may degrade since the model is not optimized for it. "
)
if limit_prediction_length:
@ -507,32 +513,46 @@ class ChronosBoltPipeline(BaseChronosPipeline):
# We truncate the context here because otherwise batches with very long
# context could take up large amounts of GPU memory unnecessarily.
if context_tensor.shape[-1] > model_context_length:
context_tensor = context_tensor[..., -model_context_length:]
if context_tensor.shape[-1] > self.model_context_length:
context_tensor = context_tensor[..., -self.model_context_length :]
# TODO: We unroll the forecast of Chronos Bolt greedily with the full forecast
# horizon that the model was trained with (i.e., 64). This results in variance collapsing
# every 64 steps.
context_tensor = context_tensor.to(
device=self.model.device,
dtype=torch.float32,
)
while remaining > 0:
with torch.no_grad():
prediction = self.model(
context=context_tensor,
).quantile_preds.to(context_tensor)
context_tensor = context_tensor.to(device=self.model.device, dtype=torch.float32)
# First block prediction
with torch.no_grad():
prediction: torch.Tensor = self.model(context=context_tensor).quantile_preds.to(context_tensor)
predictions.append(prediction)
remaining -= prediction.shape[-1]
if remaining <= 0:
break
# NOTE: The following heuristic for better prediction intervals with long-horizon forecasts
# uses all quantiles generated by the model for the first `model_prediction_length` steps,
# concatenating each quantile with the context and generating the next `model_prediction_length` steps.
# The `num_quantiles * num_quantiles` "samples" thus generated are then reduced to `num_quantiles`
# by computing empirical quantiles. Note that this option scales the batch size by `num_quantiles`
# when the `prediction_length` is greater than `model_prediction_length`.
central_idx = torch.abs(torch.tensor(self.quantiles) - 0.5).argmin()
central_prediction = prediction[:, central_idx]
if remaining > 0:
# Expand the context along quantile axis
context_tensor = context_tensor.unsqueeze(1).repeat(1, len(self.quantiles), 1)
context_tensor = torch.cat([context_tensor, central_prediction], dim=-1)
quantile_tensor = torch.tensor(self.quantiles, device=context_tensor.device)
while remaining > 0:
# Append the prediction to context
context_tensor = torch.cat([context_tensor, prediction], dim=-1)[..., -self.model_context_length :]
(batch_size, n_quantiles, context_length) = context_tensor.shape
with torch.no_grad():
# Reshape (batch, n_quantiles, context_length) -> (batch * n_quantiles, context_length)
prediction = self.model(
context=context_tensor.reshape(batch_size * n_quantiles, context_length)
).quantile_preds.to(context_tensor)
# Reshape predictions from (batch * n_quantiles, n_quantiles, model_prediction_length) to (batch, n_quantiles * n_quantiles, model_prediction_length)
prediction = prediction.reshape(batch_size, n_quantiles * n_quantiles, -1)
# Reduce `n_quantiles * n_quantiles` to n_quantiles and transpose back to (batch_size, n_quantiles, model_prediction_length)
prediction = torch.quantile(prediction, q=quantile_tensor, dim=1).transpose(0, 1)
predictions.append(prediction)
remaining -= prediction.shape[-1]
return torch.cat(predictions, dim=-1)[..., :prediction_length].to(dtype=torch.float32, device="cpu")

341
src/chronos/df_utils.py Normal file
View file

@ -0,0 +1,341 @@
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
# Authors: Abdul Fatir Ansari <ansarnd@amazon.com>
import warnings
from typing import TYPE_CHECKING
import numpy as np
if TYPE_CHECKING:
import pandas as pd
def _validate_df_types_and_cast(
df: "pd.DataFrame",
future_df: "pd.DataFrame | None",
target_columns: list[str],
id_column: str = "item_id",
timestamp_column: str = "timestamp",
) -> tuple["pd.DataFrame", "pd.DataFrame | None"]:
import pandas as pd
astype_dict = {}
future_astype_dict = {}
for col in df.columns.drop([id_column, timestamp_column]):
col_dtype = df[col].dtype
if col in target_columns and not pd.api.types.is_numeric_dtype(df[col]):
raise ValueError(f"All target columns must be numeric but got {col=} with dtype={col_dtype}")
if (
pd.api.types.is_object_dtype(df[col])
or pd.api.types.is_string_dtype(df[col])
or isinstance(col_dtype, pd.CategoricalDtype)
):
astype_dict[col] = "category"
elif pd.api.types.is_numeric_dtype(df[col]) or pd.api.types.is_bool_dtype(df[col]):
astype_dict[col] = "float32"
else:
raise ValueError(
f"All columns must contain numeric, object, category, string, or bool dtype but got {col=} with dtype={col_dtype}"
)
if future_df is not None and col in future_df.columns:
if future_df[col].dtype != col_dtype:
raise ValueError(
f"Column {col} in future_df has dtype {future_df[col].dtype} but column in df has dtype {col_dtype}"
)
future_astype_dict[col] = astype_dict[col]
df = df.astype(astype_dict, copy=True)
if future_df is not None:
future_df = future_df.astype(future_astype_dict, copy=True)
return df, future_df
def validate_df_inputs(
df: "pd.DataFrame",
future_df: "pd.DataFrame | None",
target_columns: list[str],
prediction_length: int,
id_column: str = "item_id",
timestamp_column: str = "timestamp",
) -> tuple["pd.DataFrame", "pd.DataFrame | None", str, list[int], np.ndarray]:
"""
Validates and prepares dataframe inputs
Parameters
----------
df
Input dataframe containing time series data with columns:
- id_column: Identifier for each time series
- timestamp_column: Timestamps for each observation
- target_columns: One or more target variables to forecast
- Additional columns are treated as covariates
future_df
Optional dataframe containing future covariate values with columns:
- id_column: Identifier for each time series
- timestamp_column: Future timestamps
- Subset of covariate columns from df
target_columns
Names of target columns to forecast
prediction_length
Number of future time steps to predict
id_column
Name of column containing time series identifiers
timestamp_column
Name of column containing timestamps
Returns
-------
A tuple containing:
- Validated and sorted input dataframe
- Validated and sorted future dataframe (if provided)
- Inferred frequency of the time series
- List of series lengths from input dataframe
- Original order of time series IDs
Raises
------
ValueError
If validation fails for:
- Missing required columns
- Invalid data types
- Inconsistent frequencies
- Insufficient data points
- Mismatched series between df and future_df
- Invalid future_df lengths
"""
import pandas as pd
required_cols = [id_column, timestamp_column] + target_columns
missing_cols = [col for col in required_cols if col not in df.columns]
if missing_cols:
raise ValueError(f"df does not contain all expected columns. Missing columns: {missing_cols}")
if future_df is not None:
future_required_cols = [id_column, timestamp_column]
missing_future_cols = [col for col in future_required_cols if col not in future_df.columns]
targets_in_future = [col for col in future_df.columns if col in target_columns]
extra_future_cols = [col for col in future_df.columns if col not in df.columns]
if missing_future_cols:
raise ValueError(
f"future_df does not contain all expected columns. Missing columns: {missing_future_cols}"
)
if targets_in_future:
raise ValueError(
f"future_df cannot contain target columns. Target columns found in future_df: {targets_in_future}"
)
if extra_future_cols:
raise ValueError(f"future_df cannot contain columns not present in df. Extra columns: {extra_future_cols}")
df, future_df = _validate_df_types_and_cast(
df, future_df, id_column=id_column, timestamp_column=timestamp_column, target_columns=target_columns
)
# Get the original order of time series IDs
original_order = df[id_column].unique()
# Sort and prepare df
df[timestamp_column] = pd.to_datetime(df[timestamp_column])
df = df.sort_values([id_column, timestamp_column])
# Get series lengths
series_lengths = df[id_column].value_counts(sort=False).to_list()
def validate_freq(timestamps: pd.DatetimeIndex, series_id: str):
freq = pd.infer_freq(timestamps)
if not freq:
raise ValueError(f"Could not infer frequency for series {series_id}")
return freq
# Validate each series
all_freqs = []
start_idx = 0
timestamp_index = pd.DatetimeIndex(df[timestamp_column])
for length in series_lengths:
if length < 3:
series_id = df[id_column].iloc[start_idx]
raise ValueError(
f"Every time series must have at least 3 data points, found {length=} for series {series_id}"
)
timestamps = timestamp_index[start_idx : start_idx + length]
series_id = df[id_column].iloc[start_idx]
all_freqs.append(validate_freq(timestamps, series_id))
start_idx += length
if len(set(all_freqs)) > 1:
raise ValueError("All time series must have the same frequency")
inferred_freq = all_freqs[0]
# Sort future_df if provided and validate its series lengths
future_series_lengths = None
if future_df is not None:
future_df[timestamp_column] = pd.to_datetime(future_df[timestamp_column])
future_df = future_df.sort_values([id_column, timestamp_column])
# Validate that future_df contains all series from df
context_ids = set(df[id_column].unique())
future_ids = set(future_df[id_column].unique())
if context_ids != future_ids:
raise ValueError("future_df must contain the same time series IDs as df")
future_series_lengths = future_df[id_column].value_counts(sort=False)
if (future_series_lengths != prediction_length).any():
invalid_series = future_series_lengths[future_series_lengths != prediction_length]
raise ValueError(
f"future_df must contain {prediction_length=} values for each series, "
f"but found series with different lengths: {invalid_series.to_dict()}"
)
return df, future_df, inferred_freq, series_lengths, original_order
def convert_df_input_to_list_of_dicts_input(
df: "pd.DataFrame",
future_df: "pd.DataFrame | None",
target_columns: list[str],
prediction_length: int,
id_column: str = "item_id",
timestamp_column: str = "timestamp",
validate_inputs: bool = True,
freq: str | None = None,
) -> tuple[list[dict[str, np.ndarray | dict[str, np.ndarray]]], np.ndarray, dict[str, "pd.DatetimeIndex"]]:
"""
Convert from dataframe input format to a list of dictionaries input format.
Parameters
----------
df
Input dataframe containing time series data with columns:
- id_column: Identifier for each time series
- timestamp_column: Timestamps for each observation
- target_columns: One or more target variables to forecast
- Additional columns are treated as covariates
future_df
Optional dataframe containing future covariate values with columns:
- id_column: Identifier for each time series
- timestamp_column: Future timestamps
- Subset of covariate columns from df
target_columns
Names of target columns to forecast
prediction_length
Number of future time steps to predict
id_column
Name of column containing time series identifiers
timestamp_column
Name of column containing timestamps
validate_inputs
[ADVANCED] When True (default), validates dataframes before prediction. Setting to False removes the
validation overhead, but may silently lead to wrong predictions if data is misformatted. When False, you
must ensure: (1) all dataframes are sorted by (id_column, timestamp_column); (2) future_df (if provided)
has the same item IDs as df with exactly prediction_length rows of future timestamps per item; (3) all
timestamps are regularly spaced (e.g., with hourly frequency).
freq
Frequency string for timestamp generation (e.g., "h", "D", "W"). Can only be used
when validate_inputs=False. When provided, skips frequency inference from the data.
Returns
-------
A tuple containing:
- Time series converted to list of dictionaries format
- Original order of time series IDs
- Dictionary mapping series IDs to future time index
"""
import pandas as pd
if freq is not None and validate_inputs:
raise ValueError(
"freq can only be provided when validate_inputs=False. "
"When using freq with validate_inputs=False, you must ensure: "
"(1) all dataframes are sorted by (id_column, timestamp_column); "
"(2) future_df (if provided) has the same item IDs as df with exactly "
"prediction_length rows of future timestamps per item; "
"(3) all timestamps are regularly spaced."
)
if validate_inputs:
df, future_df, freq, series_lengths, original_order = validate_df_inputs(
df,
future_df=future_df,
id_column=id_column,
timestamp_column=timestamp_column,
target_columns=target_columns,
prediction_length=prediction_length,
)
else:
# Get the original order of time series IDs
original_order = df[id_column].unique()
# Get series lengths
series_lengths = df[id_column].value_counts(sort=False).to_list()
# If freq is not provided, infer from the first series with >= 3 points
if freq is None:
timestamp_index = pd.DatetimeIndex(df[timestamp_column])
start_idx = 0
for length in series_lengths:
if length < 3:
start_idx += length
continue
timestamps = timestamp_index[start_idx : start_idx + length]
freq = pd.infer_freq(timestamps)
break
assert freq is not None, "validate_inputs is False, but could not infer frequency from the dataframe"
# Convert to list of dicts format
inputs: list[dict[str, np.ndarray | dict[str, np.ndarray]]] = []
prediction_timestamps: dict[str, pd.DatetimeIndex] = {}
indptr = np.concatenate([[0], np.cumsum(series_lengths)]).astype("int64")
target_array = df[target_columns].to_numpy().T # Shape: (n_targets, len(df))
last_ts = pd.DatetimeIndex(df[timestamp_column].iloc[indptr[1:] - 1]) # Shape: (n_series,)
offset = pd.tseries.frequencies.to_offset(freq)
with warnings.catch_warnings():
# Silence PerformanceWarning for non-vectorized offsets https://github.com/pandas-dev/pandas/blob/95624ca2e99b0/pandas/core/arrays/datetimes.py#L822
warnings.simplefilter("ignore", category=pd.errors.PerformanceWarning)
# Generate all prediction timestamps at once by stacking offsets into shape (n_series * prediction_length)
prediction_timestamps_array = pd.DatetimeIndex(
np.dstack([last_ts + step * offset for step in range(1, prediction_length + 1)]).ravel()
)
past_covariates_dict = {
col: df[col].to_numpy() for col in df.columns if col not in [id_column, timestamp_column] + target_columns
}
future_covariates_dict = {}
if future_df is not None:
for col in future_df.columns.drop([id_column, timestamp_column]):
future_covariates_dict[col] = future_df[col].to_numpy()
if validate_inputs:
if (pd.DatetimeIndex(future_df[timestamp_column]) != pd.DatetimeIndex(prediction_timestamps_array)).any():
raise ValueError(
"future_df timestamps do not match the expected prediction timestamps. "
"You can disable this check by setting `validate_inputs=False`"
)
for i in range(len(series_lengths)):
start_idx, end_idx = indptr[i], indptr[i + 1]
future_start_idx, future_end_idx = i * prediction_length, (i + 1) * prediction_length
series_id = df[id_column].iloc[start_idx]
prediction_timestamps[series_id] = prediction_timestamps_array[future_start_idx:future_end_idx]
task: dict[str, np.ndarray | dict[str, np.ndarray]] = {"target": target_array[:, start_idx:end_idx]}
if len(past_covariates_dict) > 0:
task["past_covariates"] = {col: values[start_idx:end_idx] for col, values in past_covariates_dict.items()}
if len(future_covariates_dict) > 0:
task["future_covariates"] = {
col: values[future_start_idx:future_end_idx] for col, values in future_covariates_dict.items()
}
inputs.append(task)
assert len(inputs) == len(series_lengths)
return inputs, original_order, prediction_timestamps

View file

@ -0,0 +1,35 @@
{
"alpha_pattern": {},
"auto_mapping": {
"base_model_class": "Chronos2Model",
"parent_library": "chronos.chronos2.model"
},
"base_model_name_or_path": "test/dummy-chronos2-model",
"bias": "none",
"fan_in_fan_out": false,
"inference_mode": true,
"init_lora_weights": true,
"layer_replication": null,
"layers_pattern": null,
"layers_to_transform": null,
"loftq_config": {},
"lora_alpha": 16,
"lora_dropout": 0.0,
"megatron_config": null,
"megatron_core": "megatron.core",
"modules_to_save": null,
"peft_type": "LORA",
"r": 8,
"rank_pattern": {},
"revision": null,
"target_modules": [
"self_attention.q",
"self_attention.k",
"self_attention.o",
"output_patch_embedding.output_layer",
"self_attention.v"
],
"task_type": null,
"use_dora": false,
"use_rslora": false
}

Binary file not shown.

View file

@ -3,6 +3,8 @@
from pathlib import Path
import numpy as np
import pandas as pd
import pytest
import torch
@ -12,7 +14,14 @@ from chronos import (
ChronosPipeline,
MeanScaleUniformBins,
)
from test.util import validate_tensor
from test.util import create_df, get_forecast_start_times, validate_tensor
DUMMY_MODEL_PATH = Path(__file__).parent / "dummy-chronos-model"
@pytest.fixture
def pipeline() -> ChronosPipeline:
return BaseChronosPipeline.from_pretrained(DUMMY_MODEL_PATH, device_map="cpu")
def test_base_chronos_pipeline_loads_from_huggingface():
@ -167,11 +176,7 @@ def test_tokenizer_random_data(use_eos_token: bool):
@pytest.mark.parametrize("model_dtype", [torch.float32, torch.bfloat16])
@pytest.mark.parametrize("input_dtype", [torch.float32, torch.bfloat16, torch.int64])
def test_pipeline_predict(model_dtype: torch.dtype, input_dtype: torch.dtype):
pipeline = ChronosPipeline.from_pretrained(
Path(__file__).parent / "dummy-chronos-model",
device_map="cpu",
torch_dtype=model_dtype,
)
pipeline = ChronosPipeline.from_pretrained(DUMMY_MODEL_PATH, device_map="cpu", torch_dtype=model_dtype)
context = 10 * torch.rand(size=(4, 16)) + 10
context = context.to(dtype=input_dtype)
@ -238,11 +243,7 @@ def test_pipeline_predict_quantiles(
prediction_length: int,
quantile_levels: list[int],
):
pipeline = ChronosPipeline.from_pretrained(
Path(__file__).parent / "dummy-chronos-model",
device_map="cpu",
torch_dtype=model_dtype,
)
pipeline = ChronosPipeline.from_pretrained(DUMMY_MODEL_PATH, device_map="cpu", torch_dtype=model_dtype)
context = 10 * torch.rand(size=(4, 16)) + 10
context = context.to(dtype=input_dtype)
@ -284,11 +285,7 @@ def test_pipeline_predict_quantiles(
@pytest.mark.parametrize("model_dtype", [torch.float32, torch.bfloat16])
@pytest.mark.parametrize("input_dtype", [torch.float32, torch.bfloat16, torch.int64])
def test_pipeline_embed(model_dtype: torch.dtype, input_dtype: torch.dtype):
pipeline = ChronosPipeline.from_pretrained(
Path(__file__).parent / "dummy-chronos-model",
device_map="cpu",
torch_dtype=model_dtype,
)
pipeline = ChronosPipeline.from_pretrained(DUMMY_MODEL_PATH, device_map="cpu", torch_dtype=model_dtype)
d_model = pipeline.model.model.config.d_model
context = 10 * torch.rand(size=(4, 16)) + 10
context = context.to(dtype=input_dtype)
@ -312,6 +309,88 @@ def test_pipeline_embed(model_dtype: torch.dtype, input_dtype: torch.dtype):
validate_tensor(scale, shape=(1,), dtype=torch.float32)
@pytest.mark.parametrize(
"context_setup, expected_rows",
[
# Targets only
({}, 6), # 2 series * 3 predictions
# Different context lengths
(
{"series_ids": ["X", "Y", "Z"], "n_points": [10, 17, 56], "target_cols": ["custom_target"]},
9,
), # 3 series * 3 predictions
],
)
@pytest.mark.parametrize("freq", ["s", "min", "30min", "h", "D", "W", "ME", "QE", "YE"])
def test_predict_df_works_for_valid_inputs(pipeline, context_setup, expected_rows, freq):
prediction_length = 3
df = create_df(**context_setup, freq=freq)
forecast_start_times = get_forecast_start_times(df, freq)
series_ids = context_setup.get("series_ids", ["A", "B"])
target_columns = context_setup.get("target_cols", ["target"])
n_series = len(series_ids)
n_targets = len(target_columns)
result = pipeline.predict_df(df, target=target_columns[0], prediction_length=prediction_length)
assert len(result) == expected_rows
assert "item_id" in result.columns and np.all(
result["item_id"].to_numpy() == np.array(series_ids).repeat(n_targets * prediction_length)
)
assert "target_name" in result.columns and np.all(
result["target_name"].to_numpy() == np.tile(np.array(target_columns).repeat(prediction_length), n_series)
)
assert "timestamp" in result.columns and np.all(
result.groupby("item_id")["timestamp"].min().to_numpy() == pd.to_datetime(forecast_start_times).to_numpy()
)
assert "predictions" in result.columns
assert all(str(q) in result.columns for q in [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
def test_predict_df_with_non_uniform_timestamps_raises_error(pipeline):
df = create_df()
# Make timestamps non-uniform for series A
df.loc[df["item_id"] == "A", "timestamp"] = [
"2023-01-01",
"2023-01-02",
"2023-01-04",
"2023-01-05",
"2023-01-06",
"2023-01-07",
"2023-01-08",
"2023-01-09",
"2023-01-10",
"2023-01-11",
]
with pytest.raises(ValueError, match="not infer frequency"):
pipeline.predict_df(df)
def test_predict_df_with_inconsistent_frequencies_raises_error(pipeline):
df = pd.DataFrame(
{
"item_id": ["A", "A", "A", "A", "A", "B", "B", "B", "B", "B"],
"timestamp": [
"2023-01-01",
"2023-01-02",
"2023-01-03",
"2023-01-04",
"2023-01-05",
"2023-01-01",
"2023-02-01",
"2023-03-01",
"2023-04-01",
"2023-05-01",
],
"target": [1.0] * 10,
}
)
with pytest.raises(ValueError, match="same frequency"):
pipeline.predict_df(df)
@pytest.mark.parametrize("n_tokens", [10, 1000, 10000])
def test_tokenizer_number_of_buckets(n_tokens):
config = ChronosConfig(

View file

@ -13,8 +13,10 @@ import pytest
import torch
from chronos import BaseChronosPipeline, Chronos2Pipeline
from chronos.chronos2.dataset import convert_df_input_to_list_of_dicts_input
from test.util import validate_tensor
from chronos.chronos2.config import Chronos2CoreConfig
from chronos.chronos2.layers import MHA
from chronos.df_utils import convert_df_input_to_list_of_dicts_input
from test.util import create_df, create_future_df, get_forecast_start_times, validate_tensor, timeout_callback
DUMMY_MODEL_PATH = Path(__file__).parent / "dummy-chronos2-model"
@ -32,6 +34,14 @@ def test_base_chronos2_pipeline_loads_from_s3():
BaseChronosPipeline.from_pretrained("s3://autogluon/chronos-2", device_map="cpu")
def test_base_chronos2_pipeline_loads_from_hf():
BaseChronosPipeline.from_pretrained("amazon/chronos-2", device_map="cpu")
def test_chronos2_lora_pipeline_loads_from_disk():
Chronos2Pipeline.from_pretrained(Path(__file__).parent / "dummy-chronos2-lora", device_map="cpu")
@pytest.mark.parametrize(
"inputs, prediction_length, expected_output_shapes",
[
@ -317,13 +327,11 @@ def test_when_input_is_invalid_then_predict_raises_value_error(pipeline, inputs,
_ = pipeline.predict(inputs, prediction_length=10)
@pytest.mark.parametrize("torch_dtype", [torch.float32, torch.bfloat16])
@pytest.mark.parametrize("dtype", [torch.float32, torch.bfloat16])
@pytest.mark.parametrize("input_dtype", [torch.float32, torch.bfloat16, torch.int64])
def test_pipeline_predict_can_handle_different_model_and_input_dtypes(
torch_dtype: torch.dtype, input_dtype: torch.dtype
):
def test_pipeline_predict_can_handle_different_model_and_input_dtypes(dtype: torch.dtype, input_dtype: torch.dtype):
pipeline = BaseChronosPipeline.from_pretrained(
Path(__file__).parent / "dummy-chronos2-model", device_map="cpu", torch_dtype=torch_dtype
Path(__file__).parent / "dummy-chronos2-model", device_map="cpu", torch_dtype=dtype
)
context = 10 * torch.rand(size=(4, 3, 16)) + 10
context = context.to(dtype=input_dtype)
@ -336,6 +344,35 @@ def test_pipeline_predict_can_handle_different_model_and_input_dtypes(
validate_tensor(quantiles_item, (3, expected_num_quantiles, 7), dtype=torch.float32)
@pytest.mark.parametrize(
"inputs, expected_output_shapes",
[
# NOTE: d_model for the dummy model is 6
# Homogenous univariate task
(torch.rand(4, 1, 16), [(1, 3, 6)] * 4),
# Homogenous multivariate task
(torch.rand(4, 3, 37), [(3, 5, 6)] * 4),
# Heterogenous tasks with different history lengths
(
[torch.rand(100), torch.rand(2, 150), torch.rand(120)],
[(1, 12, 6), (2, 12, 6), (1, 12, 6)],
),
],
)
def test_when_input_is_valid_then_pipeline_can_embed(pipeline, inputs, expected_output_shapes):
embeds, loc_scales = pipeline.embed(inputs)
assert (
isinstance(embeds, list)
and len(embeds) == len(expected_output_shapes)
and len(loc_scales) == len(expected_output_shapes)
)
for embed, loc_scale, expected_shape in zip(embeds, loc_scales, expected_output_shapes):
validate_tensor(embed, expected_shape, dtype=torch.float32)
validate_tensor(loc_scale[0], (expected_shape[0], 1), dtype=torch.float32)
validate_tensor(loc_scale[1], (expected_shape[0], 1), dtype=torch.float32)
@pytest.mark.parametrize(
"task_kwargs",
[
@ -383,81 +420,54 @@ def test_pipeline_can_evaluate_on_dummy_fev_task(pipeline, task_kwargs):
assert isinstance(eval_summary["test_error"], float)
def create_df(series_ids=["A", "B"], n_points=[10, 10], target_cols=["target"], covariates=None, freq="h"):
"""Helper to create test context DataFrames."""
series_dfs = []
for series_id, length in zip(series_ids, n_points):
series_data = {"item_id": series_id, "timestamp": pd.date_range(end="2001-10-01", periods=length, freq=freq)}
for target_col in target_cols:
series_data[target_col] = np.random.randn(length)
if covariates:
for cov in covariates:
series_data[cov] = np.random.randn(length)
series_dfs.append(pd.DataFrame(series_data))
return pd.concat(series_dfs, ignore_index=True)
def create_future_df(forecast_start_times: list, series_ids=["A", "B"], n_points=[5, 5], covariates=None, freq="h"):
"""Helper to create test future DataFrames."""
series_dfs = []
for series_id, length, start in zip(series_ids, n_points, forecast_start_times):
series_data = {"item_id": series_id, "timestamp": pd.date_range(start=start, periods=length, freq=freq)}
if covariates:
for cov in covariates:
series_data[cov] = np.random.randn(length)
series_dfs.append(pd.DataFrame(series_data))
return pd.concat(series_dfs, ignore_index=True)
def get_forecast_start_times(df, freq="h"):
context_end_times = df.groupby("item_id")["timestamp"].max()
forecast_start_times = [pd.date_range(end_time, periods=2, freq=freq)[-1] for end_time in context_end_times]
return forecast_start_times
@pytest.mark.parametrize(
"context_setup, future_setup, expected_rows",
"context_setup, future_setup",
[
# Targets only
({}, None, 6), # 2 series * 3 predictions
({}, None),
# Multiple targets with different context lengths
(
{"target_cols": ["sales", "revenue", "profit"], "n_points": [10, 17]},
None,
18,
), # 2 series * 3 targets * 3 predictions
({"target_cols": ["sales", "revenue", "profit"], "n_points": [10, 17]}, None),
# With past covariates
({"covariates": ["cov1"]}, None, 6),
({"covariates": ["cov1"]}, None),
# With future covariates
({"covariates": ["cov1"]}, {"covariates": ["cov1"], "n_points": [3, 3]}, 6),
({"covariates": ["cov1"]}, {"covariates": ["cov1"]}),
# With past-only and future covariates
({"covariates": ["cov1", "cov2"]}, {"covariates": ["cov1"], "n_points": [3, 3]}, 6),
({"covariates": ["cov1", "cov2"]}, {"covariates": ["cov1"]}),
# With past-only and future covariates and different series order
(
{"series_ids": ["B", "C", "A", "Z"], "n_points": [10, 20, 100, 256], "covariates": ["cov1", "cov2"]},
{
"series_ids": ["B", "C", "A", "Z"],
"covariates": ["cov1"],
"n_points": [3, 3, 3, 3],
},
12,
{"series_ids": ["B", "C", "A", "Z"], "covariates": ["cov1"]},
),
],
)
@pytest.mark.parametrize("freq", ["s", "min", "30min", "h", "D", "W", "ME", "QE", "YE"])
def test_predict_df_works_for_valid_inputs(pipeline, context_setup, future_setup, expected_rows, freq):
prediction_length = 3
@pytest.mark.parametrize("prediction_length", [1, 4])
@pytest.mark.parametrize("validate_inputs", [True, False])
def test_predict_df_works_for_valid_inputs(
pipeline, context_setup, future_setup, freq, validate_inputs, prediction_length
):
df = create_df(**context_setup, freq=freq)
forecast_start_times = get_forecast_start_times(df, freq)
future_df = create_future_df(forecast_start_times, **future_setup, freq=freq) if future_setup else None
if future_setup:
series_ids = future_setup.get("series_ids", ["A", "B"])
future_setup_with_n_points = {**future_setup, "n_points": [prediction_length] * len(series_ids)}
future_df = create_future_df(forecast_start_times, **future_setup_with_n_points, freq=freq)
else:
future_df = None
series_ids = context_setup.get("series_ids", ["A", "B"])
target_columns = context_setup.get("target_cols", ["target"])
n_series = len(series_ids)
n_targets = len(target_columns)
result = pipeline.predict_df(df, future_df=future_df, target=target_columns, prediction_length=prediction_length)
result = pipeline.predict_df(
df,
future_df=future_df,
target=target_columns,
prediction_length=prediction_length,
validate_inputs=validate_inputs,
)
expected_rows = n_series * n_targets * prediction_length
assert len(result) == expected_rows
assert "item_id" in result.columns and np.all(
result["item_id"].to_numpy() == np.array(series_ids).repeat(n_targets * prediction_length)
@ -512,9 +522,10 @@ def test_predict_df_future_df_validation_errors(pipeline, future_data, error_mat
pipeline.predict_df(df, future_df=future_df)
def test_predict_df_with_non_uniform_timestamps_raises_error(pipeline):
@pytest.mark.parametrize("validate_inputs", [True, False])
def test_predict_df_with_non_uniform_timestamps_raises_error(pipeline, validate_inputs):
df = create_df()
# Make timestamps non-uniform for series A
# Make timestamps non-uniform for series A (first series)
df.loc[df["item_id"] == "A", "timestamp"] = [
"2023-01-01",
"2023-01-02",
@ -528,8 +539,8 @@ def test_predict_df_with_non_uniform_timestamps_raises_error(pipeline):
"2023-01-11",
]
with pytest.raises(ValueError, match="not infer frequency"):
pipeline.predict_df(df)
with pytest.raises((ValueError, AssertionError), match="not infer frequency"):
pipeline.predict_df(df, validate_inputs=validate_inputs)
def test_predict_df_with_inconsistent_frequencies_raises_error(pipeline):
@ -566,26 +577,80 @@ def test_predict_df_with_future_df_missing_series_raises_error(pipeline):
pipeline.predict_df(df, future_df=future_df)
def test_predict_df_with_future_df_with_different_lengths_raises_error(pipeline):
df = create_df(series_ids=["A", "B"], covariates=["cov1"])
future_df = create_future_df(
get_forecast_start_times(df), series_ids=["A", "B"], n_points=[3, 7], covariates=["cov1"]
)
with pytest.raises(ValueError, match="all time series must have length"):
pipeline.predict_df(df, future_df=future_df, prediction_length=3)
def test_predict_df_with_future_df_with_different_freq_raises_error(pipeline):
df = create_df(series_ids=["A", "B"], covariates=["cov1"], freq="h")
future_df = create_future_df(
get_forecast_start_times(df), series_ids=["A", "B"], n_points=[3, 3], covariates=["cov1"], freq="D"
)
with pytest.raises(ValueError, match="must have the same frequency as context"):
with pytest.raises(ValueError, match="future_df timestamps do not match"):
pipeline.predict_df(df, future_df=future_df, prediction_length=3)
def test_predict_df_with_future_df_with_different_lengths_raises_error(pipeline):
df = create_df(series_ids=["A", "B"], covariates=["cov1"])
future_df = create_future_df(
get_forecast_start_times(df), series_ids=["A", "B"], n_points=[3, 7], covariates=["cov1"]
)
with pytest.raises(ValueError, match="future_df must contain prediction"):
pipeline.predict_df(df, future_df=future_df, prediction_length=3)
@pytest.mark.parametrize(
"context_setup, future_setup",
[
# Targets only
({}, None),
# Multiple targets with different context lengths
({"target_cols": ["sales", "revenue", "profit"], "n_points": [10, 17]}, None),
# With past covariates
({"covariates": ["cov1"]}, None),
# With future covariates
({"covariates": ["cov1"]}, {"covariates": ["cov1"]}),
# With past-only and future covariates
({"covariates": ["cov1", "cov2"]}, {"covariates": ["cov1"]}),
# With past-only and future covariates and different series order
(
{"series_ids": ["B", "C", "A", "Z"], "n_points": [10, 20, 100, 256], "covariates": ["cov1", "cov2"]},
{"series_ids": ["B", "C", "A", "Z"], "covariates": ["cov1"]},
),
],
)
@pytest.mark.parametrize("prediction_length", [1, 4])
def test_predict_df_outputs_different_results_with_cross_learning_enabled(
pipeline, context_setup, future_setup, prediction_length
):
freq = "h"
df = create_df(**context_setup, freq=freq)
forecast_start_times = get_forecast_start_times(df, freq)
if future_setup:
series_ids = future_setup.get("series_ids", ["A", "B"])
future_setup_with_n_points = {**future_setup, "n_points": [prediction_length] * len(series_ids)}
future_df = create_future_df(forecast_start_times, **future_setup_with_n_points, freq=freq)
else:
future_df = None
series_ids = context_setup.get("series_ids", ["A", "B"])
target_columns = context_setup.get("target_cols", ["target"])
result_with_cross_learning = pipeline.predict_df(
df,
future_df=future_df,
target=target_columns,
prediction_length=prediction_length,
cross_learning=True,
)
result_without_cross_learning = pipeline.predict_df(
df,
future_df=future_df,
target=target_columns,
prediction_length=prediction_length,
cross_learning=False,
)
assert not np.array_equal(result_with_cross_learning["predictions"], result_without_cross_learning["predictions"])
@pytest.mark.parametrize(
"inputs, prediction_length, expected_output_shapes",
[
@ -671,12 +736,20 @@ def test_predict_df_with_future_df_with_different_freq_raises_error(pipeline):
),
],
)
@pytest.mark.parametrize("finetune_mode", ["full", "lora"])
def test_when_input_is_valid_then_pipeline_can_be_finetuned(
pipeline, inputs, prediction_length, expected_output_shapes
pipeline, inputs, prediction_length, expected_output_shapes, finetune_mode
):
# Get outputs before fine-tuning
orig_outputs_before = pipeline.predict(inputs, prediction_length=prediction_length)
ft_pipeline = pipeline.fit(inputs, prediction_length=prediction_length, num_steps=5, min_past=1, batch_size=32)
ft_pipeline = pipeline.fit(
inputs,
prediction_length=prediction_length,
num_steps=5,
min_past=1,
batch_size=32,
finetune_mode=finetune_mode,
)
# Get outputs from fine-tuned pipeline
ft_outputs = ft_pipeline.predict(inputs, prediction_length=prediction_length)
# Get outputs from original pipeline after fine-tuning
@ -852,40 +925,36 @@ def test_when_input_time_series_are_too_short_then_finetuning_raises_error(pipel
@pytest.mark.parametrize(
"context_setup, future_setup, expected_rows",
"context_setup, future_setup",
[
# Targets only
({}, None, 6), # 2 series * 3 predictions
({}, None),
# Multiple targets with different context lengths
(
{"target_cols": ["sales", "revenue", "profit"], "n_points": [10, 17]},
None,
18,
), # 2 series * 3 targets * 3 predictions
({"target_cols": ["sales", "revenue", "profit"], "n_points": [10, 17]}, None),
# With past covariates
({"covariates": ["cov1"]}, None, 6),
({"covariates": ["cov1"]}, None),
# With future covariates
({"covariates": ["cov1"]}, {"covariates": ["cov1"], "n_points": [3, 3]}, 6),
({"covariates": ["cov1"]}, {"covariates": ["cov1"]}),
# With past-only and future covariates
({"covariates": ["cov1", "cov2"]}, {"covariates": ["cov1"], "n_points": [3, 3]}, 6),
({"covariates": ["cov1", "cov2"]}, {"covariates": ["cov1"]}),
# With past-only and future covariates and different series order
(
{"series_ids": ["B", "C", "A", "Z"], "n_points": [10, 20, 100, 256], "covariates": ["cov1", "cov2"]},
{
"series_ids": ["B", "C", "A", "Z"],
"covariates": ["cov1"],
"n_points": [3, 3, 3, 3],
},
12,
{"series_ids": ["B", "C", "A", "Z"], "covariates": ["cov1"]},
),
],
)
@pytest.mark.parametrize("freq", ["h", "D", "ME"])
def test_two_step_finetuning_with_df_input_works(pipeline, context_setup, future_setup, expected_rows, freq):
def test_two_step_finetuning_with_df_input_works(pipeline, context_setup, future_setup, freq):
prediction_length = 3
df = create_df(**context_setup, freq=freq)
forecast_start_times = get_forecast_start_times(df, freq)
future_df = create_future_df(forecast_start_times, **future_setup, freq=freq) if future_setup else None
if future_setup:
series_ids = future_setup.get("series_ids", ["A", "B"])
future_setup_with_n_points = {**future_setup, "n_points": [prediction_length] * len(series_ids)}
future_df = create_future_df(forecast_start_times, **future_setup_with_n_points, freq=freq)
else:
future_df = None
series_ids = context_setup.get("series_ids", ["A", "B"])
target_columns = context_setup.get("target_cols", ["target"])
@ -918,6 +987,7 @@ def test_two_step_finetuning_with_df_input_works(pipeline, context_setup, future
)
# Check predictions from the fine-tuned model are valid
expected_rows = n_series * n_targets * prediction_length
assert len(result) == expected_rows
assert "item_id" in result.columns and np.all(
result["item_id"].to_numpy() == np.array(series_ids).repeat(n_targets * prediction_length)
@ -936,3 +1006,164 @@ def test_two_step_finetuning_with_df_input_works(pipeline, context_setup, future
# Check predictions from the fine-tuned model are different from the original predictions
assert not np.allclose(orig_result_before["predictions"].to_numpy(), result["predictions"].to_numpy())
def test_when_predict_df_called_with_timeout_callback_then_timeout_error_is_raised(pipeline):
num_series = 1000
large_df = create_df(series_ids=[j for j in range(num_series)], n_points=[2048] * num_series)
with pytest.raises(TimeoutError, match="time limit exceeded"):
pipeline.predict_df(
large_df,
prediction_length=48,
after_batch=timeout_callback(0.1),
)
@pytest.mark.parametrize("attn_implementation", ["eager", "sdpa"])
def test_pipeline_works_with_different_attention_implementations(attn_implementation):
"""Test that the pipeline works with different attention implementations."""
# Load the dummy model
model_path = Path(__file__).parent / "dummy-chronos2-model"
# Load with specified attention implementation
pipeline = BaseChronosPipeline.from_pretrained(
model_path, device_map="cpu", attn_implementation=attn_implementation
)
# Verify the config has the correct attention implementation
assert pipeline.model.config._attn_implementation == attn_implementation
# Test prediction with simple input
inputs = torch.rand(2, 1, 16)
prediction_length = 7
outputs = pipeline.predict(inputs, prediction_length=prediction_length)
# Check outputs are valid
assert isinstance(outputs, list) and len(outputs) == 2
for out in outputs:
validate_tensor(out, (1, DEFAULT_MODEL_NUM_QUANTILES, 7), dtype=torch.float32)
@pytest.mark.parametrize("attn_implementation", ["eager", "sdpa"])
@pytest.mark.parametrize("output_attentions", [False, True])
def test_attention_implementations_with_output_attentions(attn_implementation, output_attentions):
"""Test that attention implementations handle output_attentions correctly."""
# Create config with specified attention implementation
config = Chronos2CoreConfig(
d_model=128,
d_kv=32,
num_heads=4,
dropout_rate=0.1,
attn_implementation=attn_implementation,
)
# Create MHA layer
mha = MHA(config, use_rope=True)
mha.eval()
# Create dummy inputs
batch_size = 2
seq_len = 10
hidden_states = torch.randn(batch_size, seq_len, config.d_model)
position_ids = torch.arange(seq_len).unsqueeze(0).expand(batch_size, -1)
mask = torch.zeros(batch_size, config.num_heads, seq_len, seq_len)
# Test forward pass
output = mha(
hidden_states=hidden_states,
mask=mask,
position_ids=position_ids,
output_attentions=output_attentions,
)
# Check output shape
assert output.hidden_states.shape == (batch_size, seq_len, config.d_model)
# Check attention weights - should only be returned when output_attentions=True
if output_attentions:
assert output.attn_weights is not None
assert output.attn_weights.shape == (batch_size, config.num_heads, seq_len, seq_len)
else:
# SDPA doesn't return weights
if attn_implementation == "sdpa":
assert output.attn_weights is None
def test_eager_and_sdpa_produce_identical_outputs(pipeline):
"""Test that eager and SDPA implementations produce identical outputs on full pipeline."""
# Reload pipeline with SDPA
model_path = Path(__file__).parent / "dummy-chronos2-model"
pipeline_sdpa = BaseChronosPipeline.from_pretrained(
model_path, device_map="cpu", attn_implementation="sdpa", torch_dtype=torch.float32
)
# Note: the original pipeline fixture uses default attn_implementation which should be sdpa
# Force eager for comparison
pipeline_eager = BaseChronosPipeline.from_pretrained(
model_path, device_map="cpu", attn_implementation="eager", torch_dtype=torch.float32
)
# Test 1: Simple univariate input
inputs_simple = torch.rand(2, 1, 16)
prediction_length = 7
with torch.no_grad():
outputs_eager = pipeline_eager.predict(inputs_simple, prediction_length=prediction_length)
outputs_sdpa = pipeline_sdpa.predict(inputs_simple, prediction_length=prediction_length)
# Verify outputs match exactly
assert len(outputs_eager) == len(outputs_sdpa)
for out_eager, out_sdpa in zip(outputs_eager, outputs_sdpa):
# Should match exactly or very close (numerical precision)
assert torch.allclose(out_eager, out_sdpa, atol=1e-5, rtol=1e-4)
# Test 2: Multivariate inputs with covariates to test group attention
inputs_grouped = [
{
"target": np.random.randn(2, 36),
"past_covariates": {
"temperature": np.random.randn(36),
"weather_type": np.random.choice(["sunny", "cloudy", "rainy"], size=36),
},
"future_covariates": {
"temperature": np.random.randn(prediction_length),
"weather_type": np.random.choice(["sunny", "cloudy", "rainy"], size=prediction_length),
},
}
for _ in range(5)
]
with torch.no_grad():
outputs_eager_grouped = pipeline_eager.predict(inputs_grouped, prediction_length=prediction_length)
outputs_sdpa_grouped = pipeline_sdpa.predict(inputs_grouped, prediction_length=prediction_length)
# Verify outputs match for grouped inputs
assert len(outputs_eager_grouped) == len(outputs_sdpa_grouped)
for out_eager, out_sdpa in zip(outputs_eager_grouped, outputs_sdpa_grouped):
# Should match exactly or very close (numerical precision)
assert torch.allclose(out_eager, out_sdpa, atol=1e-5, rtol=1e-4)
def test_pipeline_can_be_finetuned_with_preprocessed_hf_dataset(pipeline):
"""Test that fine-tuning works with preprocessed inputs from a HuggingFace Dataset."""
from chronos.chronos2.dataset import prepare_inputs
prediction_length = 8
raw_inputs = [{"target": torch.rand(20)}, {"target": torch.rand(25)}, {"target": torch.rand(30)}]
# Preprocess and convert to HF Dataset (simulating Arrow-based lazy loading)
prepared_tasks = prepare_inputs(raw_inputs, prediction_length=prediction_length, min_past=1, mode="train")
hf_dataset = datasets.Dataset.from_list(prepared_tasks).with_format("torch")
# Fine-tune with preprocessed inputs
ft_pipeline = pipeline.fit(
hf_dataset, prediction_length=prediction_length, num_steps=5, min_past=1, batch_size=32, convert_inputs=False
)
# Verify fine-tuned model can predict
ft_outputs = ft_pipeline.predict(raw_inputs, prediction_length=prediction_length)
assert len(ft_outputs) == len(raw_inputs)
for ft_out in ft_outputs:
assert ft_out.shape == (1, DEFAULT_MODEL_NUM_QUANTILES, prediction_length)
assert not torch.isnan(ft_out).any()

View file

@ -5,12 +5,21 @@ from pathlib import Path
import datasets
import fev
import numpy as np
import pandas as pd
import pytest
import torch
from chronos import BaseChronosPipeline, ChronosBoltPipeline
from chronos.chronos_bolt import InstanceNorm, Patch
from test.util import validate_tensor
from test.util import create_df, get_forecast_start_times, validate_tensor
DUMMY_MODEL_PATH = Path(__file__).parent / "dummy-chronos-bolt-model"
@pytest.fixture
def pipeline() -> ChronosBoltPipeline:
return BaseChronosPipeline.from_pretrained(DUMMY_MODEL_PATH, device_map="cpu")
def test_base_chronos_pipeline_loads_from_huggingface():
@ -20,11 +29,7 @@ def test_base_chronos_pipeline_loads_from_huggingface():
@pytest.mark.parametrize("torch_dtype", [torch.float32, torch.bfloat16])
@pytest.mark.parametrize("input_dtype", [torch.float32, torch.bfloat16, torch.int64])
def test_pipeline_predict(torch_dtype: torch.dtype, input_dtype: torch.dtype):
pipeline = ChronosBoltPipeline.from_pretrained(
Path(__file__).parent / "dummy-chronos-bolt-model",
device_map="cpu",
torch_dtype=torch_dtype,
)
pipeline = ChronosBoltPipeline.from_pretrained(DUMMY_MODEL_PATH, device_map="cpu", torch_dtype=torch_dtype)
context = 10 * torch.rand(size=(4, 16)) + 10
context = context.to(dtype=input_dtype)
expected_num_quantiles = len(pipeline.quantiles)
@ -84,11 +89,7 @@ def test_pipeline_predict_quantiles(
prediction_length: int,
quantile_levels: list[float],
):
pipeline = ChronosBoltPipeline.from_pretrained(
Path(__file__).parent / "dummy-chronos-bolt-model",
device_map="cpu",
torch_dtype=torch_dtype,
)
pipeline = ChronosBoltPipeline.from_pretrained(DUMMY_MODEL_PATH, device_map="cpu", torch_dtype=torch_dtype)
context = 10 * torch.rand(size=(4, 16)) + 10
context = context.to(dtype=input_dtype)
@ -127,11 +128,7 @@ def test_pipeline_predict_quantiles(
@pytest.mark.parametrize("model_dtype", [torch.float32, torch.bfloat16])
@pytest.mark.parametrize("input_dtype", [torch.float32, torch.bfloat16, torch.int64])
def test_pipeline_embed(model_dtype: torch.dtype, input_dtype: torch.dtype):
pipeline = ChronosBoltPipeline.from_pretrained(
Path(__file__).parent / "dummy-chronos-bolt-model",
device_map="cpu",
torch_dtype=model_dtype,
)
pipeline = ChronosBoltPipeline.from_pretrained(DUMMY_MODEL_PATH, device_map="cpu", torch_dtype=model_dtype)
d_model = pipeline.model.config.d_model
context = 10 * torch.rand(size=(4, 16)) + 10
context = context.to(dtype=input_dtype)
@ -160,6 +157,88 @@ def test_pipeline_embed(model_dtype: torch.dtype, input_dtype: torch.dtype):
validate_tensor(loc_scale[1], shape=(1,), dtype=torch.float32)
@pytest.mark.parametrize(
"context_setup, expected_rows",
[
# Targets only
({}, 6), # 2 series * 3 predictions
# Different context lengths
(
{"series_ids": ["X", "Y", "Z"], "n_points": [10, 17, 56], "target_cols": ["custom_target"]},
9,
), # 3 series * 3 predictions
],
)
@pytest.mark.parametrize("freq", ["s", "min", "30min", "h", "D", "W", "ME", "QE", "YE"])
def test_predict_df_works_for_valid_inputs(pipeline, context_setup, expected_rows, freq):
prediction_length = 3
df = create_df(**context_setup, freq=freq)
forecast_start_times = get_forecast_start_times(df, freq)
series_ids = context_setup.get("series_ids", ["A", "B"])
target_columns = context_setup.get("target_cols", ["target"])
n_series = len(series_ids)
n_targets = len(target_columns)
result = pipeline.predict_df(df, target=target_columns[0], prediction_length=prediction_length)
assert len(result) == expected_rows
assert "item_id" in result.columns and np.all(
result["item_id"].to_numpy() == np.array(series_ids).repeat(n_targets * prediction_length)
)
assert "target_name" in result.columns and np.all(
result["target_name"].to_numpy() == np.tile(np.array(target_columns).repeat(prediction_length), n_series)
)
assert "timestamp" in result.columns and np.all(
result.groupby("item_id")["timestamp"].min().to_numpy() == pd.to_datetime(forecast_start_times).to_numpy()
)
assert "predictions" in result.columns
assert all(str(q) in result.columns for q in [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
def test_predict_df_with_non_uniform_timestamps_raises_error(pipeline):
df = create_df()
# Make timestamps non-uniform for series A
df.loc[df["item_id"] == "A", "timestamp"] = [
"2023-01-01",
"2023-01-02",
"2023-01-04",
"2023-01-05",
"2023-01-06",
"2023-01-07",
"2023-01-08",
"2023-01-09",
"2023-01-10",
"2023-01-11",
]
with pytest.raises(ValueError, match="not infer frequency"):
pipeline.predict_df(df)
def test_predict_df_with_inconsistent_frequencies_raises_error(pipeline):
df = pd.DataFrame(
{
"item_id": ["A", "A", "A", "A", "A", "B", "B", "B", "B", "B"],
"timestamp": [
"2023-01-01",
"2023-01-02",
"2023-01-03",
"2023-01-04",
"2023-01-05",
"2023-01-01",
"2023-02-01",
"2023-03-01",
"2023-04-01",
"2023-05-01",
],
"target": [1.0] * 10,
}
)
with pytest.raises(ValueError, match="same frequency"):
pipeline.predict_df(df)
# The following tests have been taken from
# https://github.com/autogluon/autogluon/blob/f57beb26cb769c6e0d484a6af2b89eab8aee73a8/timeseries/tests/unittests/models/chronos/pipeline/test_chronos_bolt.py
# Author: Caner Turkmen <atturkm@amazon.com>

462
test/test_df_utils.py Normal file
View file

@ -0,0 +1,462 @@
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
from unittest.mock import patch
import numpy as np
import pandas as pd
import pytest
from chronos.df_utils import (
convert_df_input_to_list_of_dicts_input,
validate_df_inputs,
)
from test.util import create_df, create_future_df, get_forecast_start_times
# Tests for validate_df_inputs function
@pytest.mark.parametrize("freq", ["s", "min", "30min", "h", "D", "W", "ME", "QE", "YE"])
def test_validate_df_inputs_returns_correct_metadata_for_valid_inputs(freq):
"""Test that function returns validated dataframes, frequency, series lengths, and original order."""
# Create test data with 2 series
df = create_df(series_ids=["A", "B"], n_points=[10, 15], target_cols=["target"], freq=freq)
# Call validate_df_inputs
validated_df, validated_future_df, inferred_freq, series_lengths, original_order = validate_df_inputs(
df=df,
future_df=None,
target_columns=["target"],
prediction_length=5,
id_column="item_id",
timestamp_column="timestamp",
)
# Verify key return values
assert validated_future_df is None
assert inferred_freq is not None
assert series_lengths == [10, 15]
assert list(original_order) == ["A", "B"]
# Verify dataframe is sorted
assert validated_df["item_id"].iloc[0] == "A"
assert validated_df["item_id"].iloc[10] == "B"
def test_validate_df_inputs_casts_mixed_dtypes_correctly():
"""Test that numeric columns are cast to float32 and categorical/string/object columns are cast to category."""
# Create dataframe with mixed column types
df = pd.DataFrame(
{
"item_id": ["A"] * 10,
"timestamp": pd.date_range(end="2001-10-01", periods=10, freq="h"),
"target": np.random.randn(10), # numeric
"numeric_cov": np.random.randint(0, 10, 10), # integer numeric
"string_cov": ["cat1"] * 5 + ["cat2"] * 5, # string
"bool_cov": [True, False] * 5, # boolean
}
)
# Call validate_df_inputs
validated_df, _, _, _, _ = validate_df_inputs(
df=df,
future_df=None,
target_columns=["target"],
prediction_length=5,
)
# Verify dtypes after validation
assert validated_df["target"].dtype == np.float32
assert validated_df["numeric_cov"].dtype == np.float32
assert validated_df["string_cov"].dtype.name == "category"
assert validated_df["bool_cov"].dtype == np.float32 # booleans are cast to float32
def test_validate_df_inputs_raises_error_when_series_has_insufficient_data():
"""Test that ValueError is raised for series with < 3 data points."""
# Create dataframe with one series having only 2 points
df = create_df(series_ids=["A", "B"], n_points=[10, 2], target_cols=["target"], freq="h")
# Verify error is raised with series ID in message
with pytest.raises(ValueError, match=r"Every time series must have at least 3 data points.*series B"):
validate_df_inputs(
df=df,
future_df=None,
target_columns=["target"],
prediction_length=5,
)
def test_validate_df_inputs_raises_error_when_future_df_has_mismatched_series_ids():
"""Test that ValueError is raised when future_df has different series IDs than df."""
# Create df with series A and B
df = create_df(series_ids=["A", "B"], n_points=[10, 15], target_cols=["target"], freq="h")
# Create future_df with only series A
forecast_start_times = get_forecast_start_times(df, freq="h")
future_df = create_future_df(
forecast_start_times=[forecast_start_times[0]], series_ids=["A"], n_points=[5], covariates=None, freq="h"
)
# Verify appropriate error is raised
with pytest.raises(ValueError, match=r"future_df must contain the same time series IDs as df"):
validate_df_inputs(
df=df,
future_df=future_df,
target_columns=["target"],
prediction_length=5,
)
def test_validate_df_inputs_raises_error_when_future_df_has_incorrect_lengths():
"""Test that ValueError is raised when future_df lengths don't match prediction_length."""
# Create df with series A and B with a covariate
df = create_df(series_ids=["A", "B"], n_points=[10, 13], target_cols=["target"], covariates=["cov1"], freq="h")
# Create future_df with varying lengths per series (3 and 7 instead of 5)
forecast_start_times = get_forecast_start_times(df, freq="h")
future_df = create_future_df(
forecast_start_times=forecast_start_times,
series_ids=["A", "B"],
n_points=[3, 7], # incorrect lengths
covariates=["cov1"],
freq="h",
)
# Verify error message indicates which series have incorrect lengths
with pytest.raises(
ValueError, match=r"future_df must contain prediction_length=5 values for each series.*different lengths"
):
validate_df_inputs(
df=df,
future_df=future_df,
target_columns=["target"],
prediction_length=5,
)
# Tests for convert_df_input_to_list_of_dicts_input function
def test_convert_df_with_single_target_preserves_values():
"""Test conversion with single target column."""
df = create_df(series_ids=["A", "B"], n_points=[10, 12], target_cols=["target"], freq="h")
inputs, original_order, prediction_timestamps = convert_df_input_to_list_of_dicts_input(
df=df,
future_df=None,
target_columns=["target"],
prediction_length=5,
)
# Verify output list has correct length (one per series)
assert len(inputs) == 2
# Verify target arrays have correct shape and values match input
assert inputs[0]["target"].shape == (1, 10) # (n_targets=1, n_timesteps=10)
assert inputs[1]["target"].shape == (1, 12) # (n_targets=1, n_timesteps=12)
# Verify values are preserved
df_sorted = df.sort_values(["item_id", "timestamp"])
np.testing.assert_array_almost_equal(
inputs[0]["target"][0], df_sorted[df_sorted["item_id"] == "A"]["target"].values
)
np.testing.assert_array_almost_equal(
inputs[1]["target"][0], df_sorted[df_sorted["item_id"] == "B"]["target"].values
)
def test_convert_df_with_multiple_targets_preserves_values_and_shape():
"""Test conversion with multiple target columns."""
df = create_df(series_ids=["A", "B"], n_points=[10, 14], target_cols=["target1", "target2"], freq="h")
inputs, original_order, prediction_timestamps = convert_df_input_to_list_of_dicts_input(
df=df,
future_df=None,
target_columns=["target1", "target2"],
prediction_length=5,
)
# Verify target arrays have shape (n_targets, n_timesteps)
assert inputs[0]["target"].shape == (2, 10)
assert inputs[1]["target"].shape == (2, 14)
# Verify all target values are preserved for both series
df_sorted = df.sort_values(["item_id", "timestamp"])
for i, series_id in enumerate(["A", "B"]):
series_data = df_sorted[df_sorted["item_id"] == series_id]
np.testing.assert_array_almost_equal(inputs[i]["target"][0], series_data["target1"].values)
np.testing.assert_array_almost_equal(inputs[i]["target"][1], series_data["target2"].values)
def test_convert_df_with_past_covariates_includes_them_in_output():
"""Test conversion with past covariates only."""
df = create_df(
series_ids=["A", "B"], n_points=[10, 16], target_cols=["target"], covariates=["cov1", "cov2"], freq="h"
)
inputs, original_order, prediction_timestamps = convert_df_input_to_list_of_dicts_input(
df=df,
future_df=None,
target_columns=["target"],
prediction_length=5,
)
# Verify output includes past_covariates dictionary
assert "past_covariates" in inputs[0]
assert "cov1" in inputs[0]["past_covariates"]
assert "cov2" in inputs[0]["past_covariates"]
# Verify covariate values match input for both series
assert inputs[0]["past_covariates"]["cov1"].shape == (10,)
assert inputs[0]["past_covariates"]["cov2"].shape == (10,)
assert inputs[1]["past_covariates"]["cov1"].shape == (16,)
assert inputs[1]["past_covariates"]["cov2"].shape == (16,)
# Verify no future_covariates key in output
assert "future_covariates" not in inputs[0]
def test_convert_df_with_past_and_future_covariates_includes_both():
"""Test conversion with both past and future covariates."""
df = create_df(series_ids=["A", "B"], n_points=[10, 18], target_cols=["target"], covariates=["cov1"], freq="h")
forecast_start_times = get_forecast_start_times(df, freq="h")
future_df = create_future_df(
forecast_start_times=forecast_start_times,
series_ids=["A", "B"],
n_points=[5, 5],
covariates=["cov1"],
freq="h",
)
inputs, original_order, prediction_timestamps = convert_df_input_to_list_of_dicts_input(
df=df,
future_df=future_df,
target_columns=["target"],
prediction_length=5,
)
# Verify output includes both past_covariates and future_covariates dictionaries for both series
assert "past_covariates" in inputs[0]
assert "future_covariates" in inputs[0]
assert "past_covariates" in inputs[1]
assert "future_covariates" in inputs[1]
# Verify all covariate values are preserved with correct shapes
assert inputs[0]["past_covariates"]["cov1"].shape == (10,)
assert inputs[0]["future_covariates"]["cov1"].shape == (5,)
assert inputs[1]["past_covariates"]["cov1"].shape == (18,)
assert inputs[1]["future_covariates"]["cov1"].shape == (5,)
@pytest.mark.parametrize("freq", ["s", "min", "30min", "h", "D", "W", "ME", "QE", "YE"])
def test_convert_df_generates_prediction_timestamps_with_correct_frequency(freq):
"""Test that prediction timestamps follow the inferred frequency."""
# Use multiple series with irregular lengths
df = create_df(series_ids=["A", "B", "C"], n_points=[10, 15, 12], target_cols=["target"], freq=freq)
inputs, original_order, prediction_timestamps = convert_df_input_to_list_of_dicts_input(
df=df,
future_df=None,
target_columns=["target"],
prediction_length=5,
)
# Verify timestamps for all series
for series_id in ["A", "B", "C"]:
# Verify timestamps start after last context timestamp
last_context_time = df[df["item_id"] == series_id]["timestamp"].max()
first_pred_time = prediction_timestamps[series_id][0]
assert first_pred_time > last_context_time
# Verify timestamps are evenly spaced according to frequency
pred_times = prediction_timestamps[series_id]
assert len(pred_times) == 5
inferred_freq = pd.infer_freq(pred_times)
assert inferred_freq is not None
def test_convert_df_skips_validation_when_disabled():
"""Test that validate_inputs=False skips validation."""
df = create_df(series_ids=["A", "B"], n_points=[10, 12], target_cols=["target"], freq="h")
# Mock validate_df_inputs to verify it's not called when validation is disabled
with patch("chronos.df_utils.validate_df_inputs") as mock_validate:
inputs, original_order, prediction_timestamps = convert_df_input_to_list_of_dicts_input(
df=df,
future_df=None,
target_columns=["target"],
prediction_length=5,
validate_inputs=False,
)
# Verify validate_df_inputs was not called
mock_validate.assert_not_called()
# Verify conversion still works
assert len(inputs) == 2
def test_convert_df_preserves_all_values_with_random_inputs():
"""Generate random dataframe and verify all values are preserved exactly."""
# Generate random parameters
n_series = np.random.randint(2, 5)
n_targets = np.random.randint(1, 4)
n_past_only_covariates = np.random.randint(1, 3)
n_future_covariates = np.random.randint(1, 3)
prediction_length = 5
series_ids = [f"series_{i}" for i in range(n_series)]
n_points = [np.random.randint(10, 20) for _ in range(n_series)]
target_cols = [f"target_{i}" for i in range(n_targets)]
past_only_covariates = [f"past_cov_{i}" for i in range(n_past_only_covariates)]
future_covariates = [f"future_cov_{i}" for i in range(n_future_covariates)]
all_covariates = past_only_covariates + future_covariates
# Create dataframe with all covariates
df = create_df(
series_ids=series_ids, n_points=n_points, target_cols=target_cols, covariates=all_covariates, freq="h"
)
# Create future_df with only future covariates (not past-only ones)
forecast_start_times = get_forecast_start_times(df, freq="h")
future_df = create_future_df(
forecast_start_times=forecast_start_times,
series_ids=series_ids,
n_points=[prediction_length] * n_series,
covariates=future_covariates,
freq="h",
)
# Convert to list-of-dicts format
inputs, original_order, prediction_timestamps = convert_df_input_to_list_of_dicts_input(
df=df,
future_df=future_df,
target_columns=target_cols,
prediction_length=prediction_length,
)
# Verify all target values are preserved exactly
df_sorted = df.sort_values(["item_id", "timestamp"])
for i, series_id in enumerate(series_ids):
series_data = df_sorted[df_sorted["item_id"] == series_id]
assert inputs[i]["target"].shape == (n_targets, n_points[i])
for j, target_col in enumerate(target_cols):
np.testing.assert_array_almost_equal(inputs[i]["target"][j], series_data[target_col].values)
# Verify all past covariate values are preserved (both past-only and future covariates)
for i, series_id in enumerate(series_ids):
series_data = df_sorted[df_sorted["item_id"] == series_id]
assert "past_covariates" in inputs[i]
for cov in all_covariates:
np.testing.assert_array_almost_equal(inputs[i]["past_covariates"][cov], series_data[cov].values)
# Verify only future covariates are in future_covariates (not past-only ones)
future_df_sorted = future_df.sort_values(["item_id", "timestamp"])
for i, series_id in enumerate(series_ids):
series_future_data = future_df_sorted[future_df_sorted["item_id"] == series_id]
assert "future_covariates" in inputs[i]
# Only future covariates should be present
assert set(inputs[i]["future_covariates"].keys()) == set(future_covariates)
for cov in future_covariates:
np.testing.assert_array_almost_equal(inputs[i]["future_covariates"][cov], series_future_data[cov].values)
# Verify output structure is correct
assert len(inputs) == n_series
assert list(original_order) == series_ids
assert len(prediction_timestamps) == n_series
def test_convert_df_with_freq_and_validate_inputs_raises_error():
"""Test that providing freq with validate_inputs=True raises ValueError."""
df = create_df(series_ids=["A", "B"], n_points=[10, 12], target_cols=["target"], freq="h")
with pytest.raises(ValueError, match="freq can only be provided when validate_inputs=False"):
convert_df_input_to_list_of_dicts_input(
df=df,
future_df=None,
target_columns=["target"],
prediction_length=5,
freq="h",
validate_inputs=True,
)
@pytest.mark.parametrize("use_future_df", [True, False])
def test_convert_df_with_freq_and_validate_inputs_false(use_future_df):
"""Test that freq works with validate_inputs=False."""
df = create_df(series_ids=["A", "B"], n_points=[10, 12], target_cols=["target"], covariates=["cov1"], freq="h")
prediction_length = 5
future_df = None
if use_future_df:
forecast_start_times = get_forecast_start_times(df, freq="h")
future_df = create_future_df(
forecast_start_times=forecast_start_times,
series_ids=["A", "B"],
n_points=[prediction_length, prediction_length],
covariates=["cov1"],
freq="h",
)
inputs, original_order, prediction_timestamps = convert_df_input_to_list_of_dicts_input(
df=df,
future_df=future_df,
target_columns=["target"],
prediction_length=prediction_length,
freq="h",
validate_inputs=False,
)
assert len(inputs) == 2
assert len(prediction_timestamps) == 2
for series_id in ["A", "B"]:
assert len(prediction_timestamps[series_id]) == prediction_length
@pytest.mark.parametrize("use_future_df", [True, False])
def test_convert_df_with_mismatched_freq_uses_user_provided_freq(use_future_df):
"""Test that user-provided freq overrides data frequency when validate_inputs=False."""
# Create data with hourly frequency
data_freq = "h"
df = create_df(
series_ids=["A", "B"], n_points=[10, 12], target_cols=["target"], covariates=["cov1"], freq=data_freq
)
prediction_length = 5
# User provides daily frequency (different from data)
user_freq = "D"
future_df = None
if use_future_df:
# Create future_df with hourly frequency (matching data, not user freq)
forecast_start_times = get_forecast_start_times(df, freq=data_freq)
future_df = create_future_df(
forecast_start_times=forecast_start_times,
series_ids=["A", "B"],
n_points=[prediction_length, prediction_length],
covariates=["cov1"],
freq=data_freq,
)
inputs, original_order, prediction_timestamps = convert_df_input_to_list_of_dicts_input(
df=df,
future_df=future_df,
target_columns=["target"],
prediction_length=prediction_length,
freq=user_freq,
validate_inputs=False,
)
# Prediction should work
assert len(inputs) == 2
assert len(prediction_timestamps) == 2
# Forecast timestamps should use user-provided freq (daily), not data freq (hourly)
for series_id in ["A", "B"]:
pred_ts = prediction_timestamps[series_id]
assert len(pred_ts) == prediction_length
# Verify the frequency matches user-provided freq
inferred_freq = pd.infer_freq(pred_ts)
assert inferred_freq == user_freq

View file

@ -1,5 +1,8 @@
from typing import Optional, Tuple
import time
from typing import Callable, Optional, Tuple
import numpy as np
import pandas as pd
import torch
@ -9,3 +12,47 @@ def validate_tensor(a: torch.Tensor, shape: Tuple[int, ...], dtype: Optional[tor
if dtype is not None:
assert a.dtype == dtype
def create_df(series_ids=["A", "B"], n_points=[10, 10], target_cols=["target"], covariates=None, freq="h"):
"""Helper to create test context DataFrames."""
series_dfs = []
for series_id, length in zip(series_ids, n_points):
series_data = {"item_id": series_id, "timestamp": pd.date_range(end="2001-10-01", periods=length, freq=freq)}
for target_col in target_cols:
series_data[target_col] = np.random.randn(length)
if covariates:
for cov in covariates:
series_data[cov] = np.random.randn(length)
series_dfs.append(pd.DataFrame(series_data))
return pd.concat(series_dfs, ignore_index=True)
def create_future_df(forecast_start_times: list, series_ids=["A", "B"], n_points=[5, 5], covariates=None, freq="h"):
"""Helper to create test future DataFrames."""
series_dfs = []
for series_id, length, start in zip(series_ids, n_points, forecast_start_times):
series_data = {"item_id": series_id, "timestamp": pd.date_range(start=start, periods=length, freq=freq)}
if covariates:
for cov in covariates:
series_data[cov] = np.random.randn(length)
series_dfs.append(pd.DataFrame(series_data))
return pd.concat(series_dfs, ignore_index=True)
def get_forecast_start_times(df, freq="h"):
context_end_times = df.groupby("item_id")["timestamp"].max()
forecast_start_times = [pd.date_range(end_time, periods=2, freq=freq)[-1] for end_time in context_end_times]
return forecast_start_times
def timeout_callback(seconds: float | None) -> Callable:
"""Return a callback object that raises an exception if time limit is exceeded."""
start_time = time.monotonic()
def callback() -> None:
if seconds is not None and time.monotonic() - start_time > seconds:
raise TimeoutError("time limit exceeded")
return callback