Merge branch 'main' into deepwiki-badge

2026-05-22 17:21:01 +00:00 · 2026-03-10 09:14:59 -05:00 · 2026-03-10 09:14:59 -05:00 · bb8ec56d3b
commit bb8ec56d3b
parent de163a7886 f951d9aefa
33 changed files with 4038 additions and 6781 deletions
--- a/.gitattributes
+++ b/.gitattributes
@ -0,0 +1 @@
+*.ipynb linguist-language=Python
--- a/.github/ISSUE_TEMPLATE/bug_report.md
+++ b/.github/ISSUE_TEMPLATE/bug_report.md
@ -30,4 +30,4 @@ CUDA version:
 PyTorch version: 
 HuggingFace transformers version:
 HuggingFace accelerate version:
-
+Pandas version:
--- a/.github/workflows/eval-model.yml
+++ b/.github/workflows/eval-model.yml
@ -36,13 +36,13 @@ jobs:
        run: pip install ".[dev]" -f https://download.pytorch.org/whl/cpu/torch_stable.html

      - name: Run Eval Script for Chronos-2
-        run: python scripts/evaluation/evaluate.py chronos-2 ci/evaluate/backtest_config.yaml $CHRONOS_2_RESULTS_CSV --model-id=s3://autogluon/chronos-2 --device=cpu --torch-dtype=float32
+        run: python scripts/evaluation/evaluate.py chronos-2 ci/evaluate/backtest_config.yaml $CHRONOS_2_RESULTS_CSV --model-id=amazon/chronos-2 --device=cpu --torch-dtype=float32

      - name: Print Chronos-2 CSV
        run: cat $CHRONOS_2_RESULTS_CSV

      - name: Run Eval Script for Chronos-Bolt
        run: python scripts/evaluation/evaluate.py chronos-bolt ci/evaluate/backtest_config.yaml $CHRONOS_BOLT_RESULTS_CSV --model-id=amazon/chronos-bolt-small --device=cpu --torch-dtype=float32
-      
+
      - name: Print Chronos-Bolt CSV
        run: cat $CHRONOS_BOLT_RESULTS_CSV
--- a/.gitignore
+++ b/.gitignore
@ -160,4 +160,9 @@ cython_debug/
 #.idea/

 # macOS stuff
-.DS_store
+.DS_store
+
+chronos-2-finetuned
+
+# Kiro IDE
+.kiro
--- a/README.md
+++ b/README.md
@ -1,7 +1,3 @@
-<div align="center">
-<img src="https://raw.githubusercontent.com/amazon-science/chronos-forecasting/main/figures/chronos-logo.png" width="60%">
-</div>
-
 <div align="center">

 # Chronos: Pretrained Models for Time Series Forecasting
@ -11,7 +7,7 @@
 [![huggingface](https://img.shields.io/badge/%F0%9F%A4%97%20HF-Datasets-FFD21E)](https://huggingface.co/datasets/autogluon/chronos_datasets)
 [![huggingface](https://img.shields.io/badge/%F0%9F%A4%97%20HF-Models-FFD21E)](https://huggingface.co/collections/amazon/chronos-models-65f1791d630a8d57cb718444)
 [![fev](https://img.shields.io/static/v1?label=fev&message=Benchmark&color=B31B1B&logo=github)](https://github.com/autogluon/fev)
-[![aws](https://img.shields.io/static/v1?label=SageMaker&message=Deploy&color=FF9900&logo=amazon-web-services)](notebooks/deploy-chronos-bolt-to-amazon-sagemaker.ipynb)
+[![aws](https://img.shields.io/static/v1?label=SageMaker&message=Deploy&color=FF9900&logo=amazon-web-services)](notebooks/deploy-chronos-to-amazon-sagemaker.ipynb)
 [![faq](https://img.shields.io/badge/FAQ-Questions%3F-blue)](https://github.com/amazon-science/chronos-forecasting/issues?q=is%3Aissue+label%3AFAQ)
 [![License: MIT](https://img.shields.io/badge/License-Apache--2.0-green.svg)](https://opensource.org/licenses/Apache-2.0)
 [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/amazon-science/chronos-forecasting)
@ -20,8 +16,8 @@


 ## 🚀 News
- **20 Oct 2025**: 🚀 [Chronos-2](https://arxiv.org/abs/2510.15821) released. It offers _zero-shot_ support for univariate, multivariate, and covariate-informed forecasting tasks. Chronos-2 achieves the best performance on fev-bench, GIFT-Eval and Chronos Benchmark II amongst pretrained models. Check out [this notebook](notebooks/chronos-2-quickstart.ipynb) to get started with Chronos-2.
- **14 Feb 2025**: 🚀 Chronos-Bolt is now available on Amazon SageMaker JumpStart! Check out the [tutorial notebook](notebooks/deploy-chronos-bolt-to-amazon-sagemaker.ipynb) to learn how to deploy Chronos endpoints for production use in 3 lines of code.
+- **30 Dec 2025**: ☁️ Deploy Chronos-2 to AWS with Amazon SageMaker: new guide covers real-time inference (GPU/CPU), serverless endpoints with automatic scaling, and batch transform for large-scale forecasting. See the [deployment tutorial](notebooks/deploy-chronos-to-amazon-sagemaker.ipynb).
+- **20 Oct 2025**: 🚀 [Chronos-2](https://huggingface.co/amazon/chronos-2) released. It offers _zero-shot_ support for univariate, multivariate, and covariate-informed forecasting tasks. Chronos-2 achieves the best performance on fev-bench, GIFT-Eval and Chronos Benchmark II amongst pretrained models. Check out [this notebook](notebooks/chronos-2-quickstart.ipynb) to get started with Chronos-2.
 - **12 Dec 2024**: 📊 We released [`fev`](https://github.com/autogluon/fev), a lightweight package for benchmarking time series forecasting models based on the [Hugging Face `datasets`](https://huggingface.co/docs/datasets/en/index) library.
 - **26 Nov 2024**: ⚡️ Chronos-Bolt models released [on HuggingFace](https://huggingface.co/collections/amazon/chronos-models-65f1791d630a8d57cb718444). Chronos-Bolt models are more accurate (5% lower error), up to 250x faster and 20x more memory efficient than the original Chronos models of the same size!
 - **13 Mar 2024**: 🚀 Chronos [paper](https://arxiv.org/abs/2403.07815) and inference code released.
@ -40,7 +36,9 @@ This package provides an interface to the Chronos family of **pretrained time se

 | Model ID                                                               | Parameters |
 | ---------------------------------------------------------------------- | ---------- |
-| [`s3://autogluon/chronos-2`](https://arxiv.org/abs/2510.15821)   | 120M         |
+| [`amazon/chronos-2`](https://huggingface.co/amazon/chronos-2)   | 120M         |
+| [`autogluon/chronos-2-synth`](https://huggingface.co/autogluon/chronos-2-synth)   | 120M         |
+| [`autogluon/chronos-2-small`](https://huggingface.co/autogluon/chronos-2-small)   | 28M         |
 | [`amazon/chronos-bolt-tiny`](https://huggingface.co/amazon/chronos-bolt-tiny)   | 9M         |
 | [`amazon/chronos-bolt-mini`](https://huggingface.co/amazon/chronos-bolt-mini)   | 21M        |
 | [`amazon/chronos-bolt-small`](https://huggingface.co/amazon/chronos-bolt-small) | 48M        |
@ -49,7 +47,7 @@ This package provides an interface to the Chronos family of **pretrained time se
 | [`amazon/chronos-t5-mini`](https://huggingface.co/amazon/chronos-t5-mini)   | 20M        |
 | [`amazon/chronos-t5-small`](https://huggingface.co/amazon/chronos-t5-small) | 46M        |
 | [`amazon/chronos-t5-base`](https://huggingface.co/amazon/chronos-t5-base)   | 200M       |
-| [`amazon/chronos-t5-large`](https://huggingface.co/amazon/chronos-t5-large) | 710M       | 
+| [`amazon/chronos-t5-large`](https://huggingface.co/amazon/chronos-t5-large) | 710M       |

 </div>

@ -61,6 +59,10 @@ To perform inference with Chronos, the easiest way is to install this package th
 pip install chronos-forecasting
 ```

+> [!TIP]
+> For reliable production use, we recommend using Chronos-2 models through [Amazon SageMaker JumpStart](https://aws.amazon.com/sagemaker/ai/jumpstart/). Check out [this tutorial](notebooks/deploy-chronos-to-amazon-sagemaker.ipynb) to learn how to deploy Chronos-2 inference endpoints to AWS with just a few lines of code.
+
+
 ### Forecasting

 A minimal example showing how to perform forecasting using Chronos-2:
@ -69,7 +71,7 @@ A minimal example showing how to perform forecasting using Chronos-2:
 import pandas as pd  # requires: pip install 'pandas[pyarrow]'
 from chronos import Chronos2Pipeline

-pipeline = Chronos2Pipeline.from_pretrained("s3://autogluon/chronos-2", device_map="cuda")
+pipeline = Chronos2Pipeline.from_pretrained("amazon/chronos-2", device_map="cuda")

 # Load historical target values and past values of covariates
 context_df = pd.read_parquet("https://autogluon.s3.amazonaws.com/datasets/timeseries/electricity_price/train.parquet")
@ -116,8 +118,15 @@ plt.legend()
 ## Example Notebooks

 - [Chronos-2 Quick Start](notebooks/chronos-2-quickstart.ipynb)
- [Deploy Chronos-Bolt on Amazon SageMaker](notebooks/deploy-chronos-bolt-to-amazon-sagemaker.ipynb)
- Deploy Chronos-2 on Amazon SageMaker (coming soon!)
+  &nbsp;
+  <a href="https://studiolab.sagemaker.aws/import/github/amazon-science/chronos-forecasting/blob/main/notebooks/chronos-2-quickstart.ipynb">
+    <img src="https://studiolab.sagemaker.aws/studiolab.svg" alt="Open In SageMaker Studio Lab" height="18" align="absmiddle">
+  </a>
+  &nbsp;
+  <a href="https://colab.research.google.com/github/amazon-science/chronos-forecasting/blob/main/notebooks/chronos-2-quickstart.ipynb">
+    <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" height="18" align="absmiddle">
+  </a>
+- [Deploy Chronos-2 on Amazon SageMaker](notebooks/deploy-chronos-to-amazon-sagemaker.ipynb)

 ## 📝 Citation

--- a/figures/chronos-logo.png
+++ b/figures/chronos-logo.png
--- a/figures/main-figure.png
+++ b/figures/main-figure.png
--- a/figures/zero_shot-agg_scaled_score.svg
+++ b/figures/zero_shot-agg_scaled_score.svg
--- a/notebooks/chronos-2-quickstart.ipynb
+++ b/notebooks/chronos-2-quickstart.ipynb
--- a/notebooks/deploy-chronos-bolt-to-amazon-sagemaker.ipynb
+++ b/notebooks/deploy-chronos-bolt-to-amazon-sagemaker.ipynb
--- a/notebooks/deploy-chronos-to-amazon-sagemaker.ipynb
+++ b/notebooks/deploy-chronos-to-amazon-sagemaker.ipynb
--- a/pyproject.toml
+++ b/pyproject.toml
@ -14,13 +14,12 @@ readme = "README.md"
 license = { file = "LICENSE" }
 requires-python = ">=3.10"
 dependencies = [
-  "torch>=2.0,<3",
-  "transformers>=4.49,<5",
+  "torch>=2.2,<3",
+  "transformers>=4.41,<5",
  "accelerate>=0.34,<2",
  "numpy>=1.21,<3",
  "einops>=0.7.0,<1",
  "scikit-learn>=1.6.0,<2",
-  "boto3",
 ]
 classifiers = [
  "Programming Language :: Python :: 3",
@ -40,7 +39,19 @@ packages = ["src/chronos"]
 path = "src/chronos/__about__.py"

 [project.optional-dependencies]
-test = ["pytest~=8.0", "numpy>=1.21,<3", "fev>=0.6.1", "pandas>=2.0,<2.4"]
+extras = [
+  "boto3>=1.10,<2",
+  "peft>=0.13.0,<0.18",
+  "fev>=0.6.1",
+  "pandas[pyarrow]>=2.0,<2.4",
+]
+test = [
+  "pytest~=8.0",
+  "boto3>=1.10,<2",
+  "peft>=0.13.0,<1",
+  "fev>=0.6.1",
+  "pandas[pyarrow]>=2.0,<2.4",
+]
 typecheck = ["mypy~=1.9"]
 dev = [
  "gluonts[pro]~=0.16",
--- a/scripts/evaluation/evaluate.py
+++ b/scripts/evaluation/evaluate.py
@ -295,7 +295,7 @@ def chronos_2(
    device: str = "cuda",
    torch_dtype: str = "float32",
    batch_size: int = 32,
-    predict_batches_jointly: bool = False,
+    cross_learning: bool = False,
 ):
    """Evaluate Chronos-2 models.

@ -316,7 +316,7 @@ def chronos_2(
    batch_size : int, optional, default = 32
        Batch size for inference. For Chronos-Bolt models, significantly larger
        batch sizes can be used
-    predict_batches_jointly: bool, optional, default = False
+    cross_learning: bool, optional, default = False
        If True, cross-learning is enables and model makes joint predictions for all
        items in the batch
    """
@ -335,7 +335,7 @@ def chronos_2(
        metrics_path=metrics_path,
        model_id=model_id,
        batch_size=batch_size,
-        predict_batches_jointly=predict_batches_jointly,
+        cross_learning=cross_learning,
    )


--- a/scripts/training/train.py
+++ b/scripts/training/train.py
@ -663,7 +663,6 @@ def main(
        lr_scheduler_type=lr_scheduler_type,
        warmup_ratio=warmup_ratio,
        optim=optim,
-        logging_dir=str(output_dir / "logs"),
        logging_strategy="steps",
        logging_steps=log_steps,
        save_strategy="steps",
--- a/src/chronos/about.py
+++ b/src/chronos/about.py
@ -1 +1 @@
-__version__ = "2.0.0"
+__version__ = "2.2.2"
--- a/src/chronos/base.py
+++ b/src/chronos/base.py
@ -17,8 +17,10 @@ import torch
 if TYPE_CHECKING:
    import datasets
    import fev
+    import pandas as pd
    from transformers import PreTrainedModel

+
 from .utils import left_pad_and_stack_1D


@ -53,6 +55,14 @@ class BaseChronosPipeline(metaclass=PipelineRegistry):
        # for easy access to the inner HF-style model
        self.inner_model = inner_model

+    @property
+    def model_context_length(self) -> int:
+        raise NotImplementedError()
+
+    @property
+    def model_prediction_length(self) -> int:
+        raise NotImplementedError()
+
    def _prepare_and_validate_context(self, context: Union[torch.Tensor, List[torch.Tensor]]):
        if isinstance(context, list):
            context = left_pad_and_stack_1D(context)
@ -122,6 +132,123 @@ class BaseChronosPipeline(metaclass=PipelineRegistry):
        """
        raise NotImplementedError()

+    def predict_df(
+        self,
+        df: "pd.DataFrame",
+        *,
+        id_column: str = "item_id",
+        timestamp_column: str = "timestamp",
+        target: str = "target",
+        prediction_length: int | None = None,
+        quantile_levels: list[float] = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
+        validate_inputs: bool = True,
+        freq: str | None = None,
+        **predict_kwargs,
+    ) -> "pd.DataFrame":
+        """
+        Perform forecasting on time series data in a long-format pandas DataFrame.
+
+        Parameters
+        ----------
+        df
+            Time series data in long format with an id column, a timestamp, and one target column.
+            Any other columns, if present, will be ignored
+        id_column
+            The name of the column which contains the unique time series identifiers, by default "item_id"
+        timestamp_column
+            The name of the column which contains timestamps, by default "timestamp"
+            All time series in the dataframe must have regular timestamps with the same frequency (no gaps)
+        target
+            The name of the column which contains the target variables to be forecasted, by default "target"
+        prediction_length
+            Number of steps to predict for each time series
+        quantile_levels
+            Quantile levels to compute
+        validate_inputs
+            [ADVANCED] When True (default), validates dataframes before prediction. Setting to False removes the
+            validation overhead, but may silently lead to wrong predictions if data is misformatted. When False, you
+            must ensure: (1) all dataframes are sorted by (id_column, timestamp_column); (2) future_df (if provided)
+            has the same item IDs as df with exactly prediction_length rows of future timestamps per item; (3) all
+            timestamps are regularly spaced (e.g., with hourly frequency).
+        freq
+            Frequency string for timestamp generation (e.g., "h", "D", "W"). Can only be used when
+            validate_inputs=False. When provided, skips frequency inference from the data.
+        **predict_kwargs
+            Additional arguments passed to predict_quantiles
+
+        Returns
+        -------
+        The forecasts dataframe generated by the model with the following columns
+        - `id_column`: The time series ID
+        - `timestamp_column`: Future timestamps
+        - "target_name": The name of the target column
+        - "predictions": The point predictions generated by the model
+        - One column for predictions at each quantile level in `quantile_levels`
+        """
+        try:
+            import pandas as pd
+
+            from .df_utils import convert_df_input_to_list_of_dicts_input
+        except ImportError:
+            raise ImportError("pandas is required for predict_df. Please install it with `pip install pandas`.")
+
+        if not isinstance(target, str):
+            raise ValueError(
+                f"Expected `target` to be str, but found {type(target)}. {self.__class__.__name__} only supports univariate forecasting."
+            )
+
+        if prediction_length is None:
+            prediction_length = self.model_prediction_length
+
+        inputs, original_order, prediction_timestamps = convert_df_input_to_list_of_dicts_input(
+            df=df,
+            future_df=None,
+            id_column=id_column,
+            timestamp_column=timestamp_column,
+            target_columns=[target],
+            prediction_length=prediction_length,
+            freq=freq,
+            validate_inputs=validate_inputs,
+        )
+
+        # NOTE: any covariates, if present, are ignored here
+        context = [torch.tensor(item["target"]).squeeze(0) for item in inputs]  # squeeze the extra variate dim
+
+        # Generate forecasts
+        quantiles, mean = self.predict_quantiles(
+            inputs=context,
+            prediction_length=prediction_length,
+            quantile_levels=quantile_levels,
+            limit_prediction_length=False,
+            **predict_kwargs,
+        )
+
+        quantiles_np = quantiles.numpy()  # [n_series, horizon, num_quantiles]
+        mean_np = mean.numpy()  # [n_series, horizon]
+
+        series_ids = list(prediction_timestamps.keys())
+        future_ts = list(prediction_timestamps.values())
+
+        data = {
+            id_column: np.repeat(series_ids, prediction_length),
+            timestamp_column: np.concatenate(future_ts),
+            "target_name": target,
+            "predictions": mean_np.ravel(),
+        }
+
+        quantiles_flat = quantiles_np.reshape(-1, len(quantile_levels))
+        for q_idx, q_level in enumerate(quantile_levels):
+            data[str(q_level)] = quantiles_flat[:, q_idx]
+
+        predictions_df = pd.DataFrame(data)
+        # If validate_inputs=False, the df is used as-is without sorting by item_id, no reordering required
+        if validate_inputs:
+            predictions_df.set_index(id_column, inplace=True)
+            predictions_df = predictions_df.loc[original_order]
+            predictions_df.reset_index(inplace=True)
+
+        return predictions_df
+
    def predict_fev(
        self, task: "fev.Task", batch_size: int = 32, **kwargs
    ) -> tuple[list["datasets.DatasetDict"], float]:
--- a/src/chronos/boto_utils.py
+++ b/src/chronos/boto_utils.py
@ -6,6 +6,7 @@
 import logging
 import os
 import re
+import warnings
 from pathlib import Path

 import boto3
@ -110,6 +111,12 @@ def cache_model_from_s3(

    # Use CloudFront CDN for faster, cached downloads if available
    if cloudfront_url:
+        warnings.warn(
+            f"Loading {s3_uri} from CloudFront is deprecated and will be removed in a future version. "
+            f'Please specify a HuggingFace model_id instead. For example: Chronos2Pipeline.from_pretrained("amazon/chronos-2")',
+            category=FutureWarning,
+            stacklevel=3,
+        )
        try:
            download_model_files_from_cloudfront(
                cloudfront_url=cloudfront_url,
--- a/src/chronos/chronos.py
+++ b/src/chronos/chronos.py
@ -377,6 +377,14 @@ class ChronosPipeline(BaseChronosPipeline):
        self.tokenizer = tokenizer
        self.model = model

+    @property
+    def model_context_length(self) -> int:
+        return self.model.config.context_length
+
+    @property
+    def model_prediction_length(self) -> int:
+        return self.model.config.prediction_length
+
    def _prepare_and_validate_context(self, context: Union[torch.Tensor, List[torch.Tensor]]):
        if isinstance(context, list):
            context = left_pad_and_stack_1D(context)
--- a/src/chronos/chronos2/config.py
+++ b/src/chronos/chronos2/config.py
@ -4,7 +4,7 @@
 # Authors: Abdul Fatir Ansari <ansarnd@amazon.com>

 from dataclasses import dataclass
-from typing import List
+from typing import List, Literal

 from transformers.configuration_utils import PretrainedConfig

@ -39,6 +39,8 @@ class Chronos2CoreConfig(PretrainedConfig):
        Token ID for padding/missing value token, by default 0
    rope_theta
        The base theta for rotary position embedding (RoPE), by default 10000.0
+    attn_implementation
+        The attention implementation to use. Options: "eager" or "sdpa", by default None (uses "sdpa")
    """

    model_type = "t5"
@ -63,6 +65,7 @@ class Chronos2CoreConfig(PretrainedConfig):
        vocab_size: int = 2,
        pad_token_id: int = 0,
        rope_theta: float = 10000.0,
+        attn_implementation: Literal["eager", "sdpa"] | None = None,
        **kwargs,
    ):
        self.vocab_size = vocab_size
@ -83,11 +86,17 @@ class Chronos2CoreConfig(PretrainedConfig):

        assert not self.is_gated_act, "gated activation is not supported"

+        # Attention implementation - default to "sdpa" if not specified
+        attn_implementation = attn_implementation or "sdpa"
+        assert attn_implementation in ["eager", "sdpa"], f"attn_implementation {attn_implementation} not supported"
+
        # unused
        kwargs.pop("is_encoder_decoder", None)
        kwargs.pop("eos_token_id", None)

-        super().__init__(pad_token_id=pad_token_id, is_encoder_decoder=False, **kwargs)
+        super().__init__(
+            pad_token_id=pad_token_id, is_encoder_decoder=False, attn_implementation=attn_implementation, **kwargs
+        )


@dataclass
--- a/src/chronos/chronos2/dataset.py
+++ b/src/chronos/chronos2/dataset.py
@ -5,7 +5,7 @@

 import math
 from enum import Enum
-from typing import TYPE_CHECKING, Iterator, Mapping, Sequence, TypeAlias, cast
+from typing import TYPE_CHECKING, Any, Iterable, Iterator, Mapping, Sequence, TypeAlias, TypedDict, cast

 import numpy as np
 import torch
@ -15,12 +15,21 @@ from torch.utils.data import IterableDataset
 if TYPE_CHECKING:
    import datasets
    import fev
-    import pandas as pd


 TensorOrArray: TypeAlias = torch.Tensor | np.ndarray


+class PreparedInput(TypedDict):
+    """A preprocessed time series input ready for model training/inference."""
+
+    context: torch.Tensor  # (n_variates, history_length), float32
+    future_covariates: torch.Tensor  # (n_variates, prediction_length), float32
+    n_targets: int
+    n_covariates: int
+    n_future_covariates: int
+
+
 def left_pad_and_cat_2D(tensors: list[torch.Tensor]) -> torch.Tensor:
    """
    Left pads tensors in the list to the length of the longest tensor along the second axis, then concats
@ -38,14 +47,14 @@ def left_pad_and_cat_2D(tensors: list[torch.Tensor]) -> torch.Tensor:
    return torch.cat(padded, dim=0)


-def validate_and_prepare_single_dict_task(
-    task: Mapping[str, TensorOrArray | Mapping[str, TensorOrArray]], idx: int, prediction_length: int
-) -> tuple[torch.Tensor, torch.Tensor, int, int, int]:
-    """Validates and prepares a single dictionary task for Chronos2Model.
+def validate_and_prepare_single_dict_input(
+    raw_input: Mapping[str, TensorOrArray | Mapping[str, TensorOrArray]], idx: int, prediction_length: int
+) -> PreparedInput:
+    """Validates and prepares a single dictionary input for Chronos2Model.

    Parameters
    ----------
-    task
+    raw_input
        A dictionary representing a time series that contains:
        - `target` (required): a 1-d or 2-d `torch.Tensor` or `np.ndarray` of shape (history_length,) or (n_variates, history_length).
        Forecasts will be generated for items in `target`.
@ -56,27 +65,27 @@ def validate_and_prepare_single_dict_task(
        covariates and values must be 1-d `torch.Tensor` or `np.ndarray` with length equal to the `prediction_length`. All keys in
        `future_covariates` must be a subset of the keys in `past_covariates`.
    idx
-        Index of this task in the list of tasks, used for error messages
+        Index of this input in the list of inputs, used for error messages
    prediction_length
        Number of future time steps to predict, used to validate future covariates

    Returns
    ------
-    A tuple containing:
-    - task_context_tensor: Concatenated tensor of target and past covariates of shape (group_size, history_length),
-        the first `task_n_targets` items along the first axis contain the target variables and the remaining items contain past-only covariates
+    A PreparedInput containing:
+    - context: Concatenated tensor of target and past covariates of shape (group_size, history_length),
+        the first `n_targets` items along the first axis contain the target variables and the remaining items contain past-only covariates
        and past values of known future covariates.
-    - task_future_covariates_tensor: Tensor of future covariates of shape (group_size, prediction_length). The last `task_n_future_covariates`
+    - future_covariates: Tensor of future covariates of shape (group_size, prediction_length). The last `n_future_covariates`
        items along the first axis contain future covariates. All the remaining elements corresponding to target and past-only covariates are NaNs.
-    - task_n_targets: Number of target variables
-    - task_n_covariates: Total number of covariates (sum of past-only and known future covariates)
-    - task_n_future_covariates: Number of known future covariates
+    - n_targets: Number of target variables
+    - n_covariates: Total number of covariates (sum of past-only and known future covariates)
+    - n_future_covariates: Number of known future covariates
    """

    allowed_keys = {"target", "past_covariates", "future_covariates"}

    # validate keys
-    keys = set(task.keys())
+    keys = set(raw_input.keys())
    if not keys.issubset(allowed_keys):
        raise ValueError(
            f"Found invalid keys in element at index {idx}. Allowed keys are {allowed_keys}, but found {keys}"
@ -85,38 +94,58 @@ def validate_and_prepare_single_dict_task(
        raise ValueError(f"Element at index {idx} does not contain the required key 'target'")

    # validate target
-    task_target = task["target"]
-    if isinstance(task_target, np.ndarray):
-        task_target = torch.from_numpy(task_target)
-    assert isinstance(task_target, torch.Tensor)
-    if task_target.ndim > 2:
+    target = raw_input["target"]
+    if isinstance(target, np.ndarray):
+        target = torch.from_numpy(target)
+    assert isinstance(target, torch.Tensor)
+    if target.ndim > 2:
        raise ValueError(
            "When the input is a list of dicts, the `target` should either be 1-d with shape (history_length,) "
-            f" or 2-d with shape (n_variates, history_length). Found element at index {idx} with shape {tuple(task_target.shape)}."
+            f" or 2-d with shape (n_variates, history_length). Found element at index {idx} with shape {tuple(target.shape)}."
        )
-    history_length = task_target.shape[-1]
-    task_target = task_target.view(-1, history_length)
+    history_length = target.shape[-1]
+    target = target.view(-1, history_length)

    # validate past_covariates
    cat_encoders: dict = {}
-    task_past_covariates = task.get("past_covariates", {})
-    if not isinstance(task_past_covariates, dict):
+    past_covariates = raw_input.get("past_covariates", {})
+    if not isinstance(past_covariates, dict):
        raise ValueError(
            f"Found invalid type for `past_covariates` in element at index {idx}. "
-            f'Expected dict with {{"feat_1": tensor_1, "feat_2": tensor_2, ...}}, but found {type(task_past_covariates)}'
+            f'Expected dict with {{"feat_1": tensor_1, "feat_2": tensor_2, ...}}, but found {type(past_covariates)}'
        )
-    task_covariates_keys = sorted(task_past_covariates.keys())
-    task_past_covariates_list: list[torch.Tensor] = []
-    for key in task_covariates_keys:
-        tensor = task_past_covariates[key]
+
+    # gather keys and ensure known-future keys come last to match downstream assumptions
+    covariates_keys = sorted(past_covariates.keys())
+
+    future_covariates = raw_input.get("future_covariates", {})
+    if not isinstance(future_covariates, dict):
+        raise ValueError(
+            f"Found invalid type for `future_covariates` in element at index {idx}. "
+            f'Expected dict with {{"feat_1": tensor_1, "feat_2": tensor_2, ...}}, but found {type(future_covariates)}'
+        )
+    future_covariates_keys = sorted(future_covariates.keys())
+    if not set(future_covariates_keys).issubset(covariates_keys):
+        raise ValueError(
+            f"Expected keys in `future_covariates` to be a subset of `past_covariates` {covariates_keys}, "
+            f"but found {future_covariates_keys} in element at index {idx}"
+        )
+
+    # create ordered keys: past-only first, then known-future (so known-future are the last rows)
+    past_only_keys = [k for k in covariates_keys if k not in future_covariates_keys]
+    ordered_covariate_keys = past_only_keys + future_covariates_keys
+
+    past_covariates_list: list[torch.Tensor] = []
+    for key in ordered_covariate_keys:
+        tensor = past_covariates[key]
        if isinstance(tensor, np.ndarray):
            # apply encoding to categorical variates
            if not np.issubdtype(tensor.dtype, np.number):
                # target encoding, if the target is 1-d
-                if task_target.shape[0] == 1:
+                if target.shape[0] == 1:
                    cat_encoder = TargetEncoder(target_type="continuous", smooth=1.0)
                    X = tensor.astype(str).reshape(-1, 1)
-                    y = task_target.view(-1).numpy()
+                    y = target.view(-1).numpy()
                    mask = np.isfinite(y)
                    X = X[mask]
                    y = y[mask]
@ -134,29 +163,18 @@ def validate_and_prepare_single_dict_task(
                f"Individual `past_covariates` must be 1-d with length equal to the length of `target` (= {history_length}), "
                f"found: {key} with shape {tuple(tensor.shape)} in element at index {idx}"
            )
-        task_past_covariates_list.append(tensor)
-    task_past_covariates_tensor = (
-        torch.stack(task_past_covariates_list, dim=0)
-        if task_past_covariates_list
-        else torch.zeros((0, history_length), device=task_target.device)
+        past_covariates_list.append(tensor)
+    past_covariates_tensor = (
+        torch.stack(past_covariates_list, dim=0)
+        if past_covariates_list
+        else torch.zeros((0, history_length), device=target.device)
    )
-    # validate future_covariates
-    task_future_covariates = task.get("future_covariates", {})
-    if not isinstance(task_future_covariates, dict):
-        raise ValueError(
-            f"Found invalid type for `future_covariates` in element at index {idx}. "
-            f'Expected dict with {{"feat_1": tensor_1, "feat_2": tensor_2, ...}}, but found {type(task_future_covariates)}'
-        )
-    task_future_covariates_keys = sorted(task_future_covariates.keys())
-    if not set(task_future_covariates_keys).issubset(task_covariates_keys):
-        raise ValueError(
-            f"Expected keys in `future_covariates` to be a subset of `past_covariates` {task_covariates_keys}, "
-            f"but found {task_future_covariates_keys} in element at index {idx}"
-        )
-    task_future_covariates_list: list[torch.Tensor] = []
-    for key in task_covariates_keys:
+
+    # validate future_covariates (build rows in the same ordered_covariate_keys order)
+    future_covariates_list: list[torch.Tensor] = []
+    for key in ordered_covariate_keys:
        # future values of past-only covariates are filled with NaNs
-        tensor = task_future_covariates.get(key, torch.full((prediction_length,), fill_value=torch.nan))
+        tensor = future_covariates.get(key, torch.full((prediction_length,), fill_value=torch.nan))
        if isinstance(tensor, np.ndarray):
            # apply encoding to categorical variates
            if not np.issubdtype(tensor.dtype, np.number):
@ -169,34 +187,118 @@ def validate_and_prepare_single_dict_task(
                f"Individual `future_covariates` must be 1-d with length equal to the {prediction_length=}, "
                f"found: {key} with shape {tuple(tensor.shape)} in element at index {idx}"
            )
-        task_future_covariates_list.append(tensor)
-    task_future_covariates_tensor = (
-        torch.stack(task_future_covariates_list, dim=0)
-        if task_future_covariates_list
-        else torch.zeros((0, prediction_length), device=task_target.device)
+        future_covariates_list.append(tensor)
+    future_covariates_tensor = (
+        torch.stack(future_covariates_list, dim=0)
+        if future_covariates_list
+        else torch.zeros((0, prediction_length), device=target.device)
    )
    # future values of target series are filled with NaNs
-    task_future_covariates_target_padding = torch.full(
-        (task_target.shape[0], prediction_length), fill_value=torch.nan, device=task_target.device
+    future_covariates_target_padding = torch.full(
+        (target.shape[0], prediction_length), fill_value=torch.nan, device=target.device
    )

-    task_context_tensor = torch.cat([task_target, task_past_covariates_tensor], dim=0).to(dtype=torch.float32)
-    task_future_covariates_tensor = torch.cat(
-        [task_future_covariates_target_padding, task_future_covariates_tensor], dim=0
+    context_tensor = torch.cat([target, past_covariates_tensor], dim=0).to(dtype=torch.float32)
+    future_covariates_tensor = torch.cat(
+        [future_covariates_target_padding, future_covariates_tensor], dim=0
    ).to(dtype=torch.float32)
-    task_n_targets = task_target.shape[0]
-    task_n_covariates = task_past_covariates_tensor.shape[0]
-    task_n_future_covariates = len(task_future_covariates_list)
+    n_targets = target.shape[0]
+    n_covariates = past_covariates_tensor.shape[0]
+    # number of known-future covariates
+    n_future_covariates = len(future_covariates_keys)

-    return (
-        task_context_tensor,
-        task_future_covariates_tensor,
-        task_n_targets,
-        task_n_covariates,
-        task_n_future_covariates,
+    return PreparedInput(
+        context=context_tensor,
+        future_covariates=future_covariates_tensor,
+        n_targets=n_targets,
+        n_covariates=n_covariates,
+        n_future_covariates=n_future_covariates,
    )


+def prepare_inputs(
+    raw_inputs: Iterable[Mapping[str, Any]],
+    prediction_length: int,
+    min_past: int = 1,
+    mode: "DatasetMode | str" = "train",
+) -> list[PreparedInput]:
+    """Prepare multiple time series inputs for training/inference.
+
+    This function handles mode-specific preprocessing (e.g., filtering short series)
+    and calls validate_and_prepare_single_dict_input for each input.
+    """
+    inputs: list[PreparedInput] = []
+
+    for idx, raw_input in enumerate(raw_inputs):
+        # For non-TEST modes, fix future_covariates (replace None/empty with NaN arrays)
+        if mode != DatasetMode.TEST:
+            raw_future_covariates = raw_input.get("future_covariates", {})
+            if raw_future_covariates:
+                raw_future_covariates = cast(dict[str, TensorOrArray | None], raw_future_covariates)
+                fixed_future_covariates = {}
+                for key, value in raw_future_covariates.items():
+                    fixed_future_covariates[key] = (
+                        np.full(prediction_length, np.nan) if value is None or len(value) == 0 else value
+                    )
+                raw_input = {**raw_input, "future_covariates": fixed_future_covariates}
+
+        raw_input = cast(dict[str, TensorOrArray | Mapping[str, TensorOrArray]], raw_input)
+        prepared = validate_and_prepare_single_dict_input(raw_input, idx, prediction_length)
+
+        # Filter by minimum length (except in TEST mode)
+        if mode != DatasetMode.TEST and prepared["context"].shape[-1] < min_past + prediction_length:
+            continue
+
+        inputs.append(prepared)
+
+    if len(inputs) == 0:
+        raise ValueError(
+            "The dataset is empty after filtering based on the length of the time series (length >= min_past + prediction_length). "
+            "Please provide longer time series or reduce `min_past` or `prediction_length`. "
+        )
+
+    return inputs
+
+
+def validate_prepared_schema(prepared_input: Any) -> None:
+    """Validate that an input matches the PreparedInput schema."""
+    if not isinstance(prepared_input, Mapping):
+        raise TypeError(
+            f"Expected input to be a dict-like, got {type(prepared_input).__name__}. "
+            "Set convert_inputs=True when calling fit() to preprocess raw inputs."
+        )
+
+    required_keys = {"context", "future_covariates", "n_targets", "n_covariates", "n_future_covariates"}
+    missing = required_keys - set(prepared_input.keys())
+    if missing:
+        raise TypeError(
+            f"Input is missing required keys: {missing}. Set convert_inputs=True when calling fit() to preprocess raw inputs."
+        )
+
+    context = prepared_input["context"]
+    if not isinstance(context, torch.Tensor) or context.ndim != 2:
+        raise TypeError(
+            f"Expected 'context' to be 2-d torch.Tensor, got {type(context).__name__} "
+            f"with shape {getattr(context, 'shape', 'N/A')}. "
+            "Set convert_inputs=True when calling fit() to preprocess raw inputs."
+        )
+
+    future_covariates = prepared_input["future_covariates"]
+    if not isinstance(future_covariates, torch.Tensor) or future_covariates.ndim != 2:
+        raise TypeError(
+            f"Expected 'future_covariates' to be 2-d torch.Tensor, got {type(future_covariates).__name__} "
+            f"with shape {getattr(future_covariates, 'shape', 'N/A')}. "
+            "Set convert_inputs=True when calling fit() to preprocess raw inputs."
+        )
+
+    if context.shape[0] != future_covariates.shape[0]:
+        raise ValueError(
+            f"Expected 'context' and 'future_covariates' to have the same first dimension, "
+            f"got {context.shape[0]} and {future_covariates.shape[0]}. "
+            "Set convert_inputs=True when calling fit() to preprocess raw inputs."
+        )
+
+
 def convert_list_of_tensors_input_to_list_of_dicts_input(
    list_of_tensors: Sequence[TensorOrArray],
 ) -> list[dict[str, torch.Tensor]]:
@ -265,308 +367,6 @@ def convert_tensor_input_to_list_of_dicts_input(tensor: TensorOrArray) -> list[d
    return output


-def _validate_df_types_and_cast(
-    df: "pd.DataFrame",
-    future_df: "pd.DataFrame | None",
-    target_columns: list[str],
-    id_column: str = "item_id",
-    timestamp_column: str = "timestamp",
-) -> tuple["pd.DataFrame", "pd.DataFrame | None"]:
-    import pandas as pd
-
-    astype_dict = {}
-    future_astype_dict = {}
-    for col in df.columns.drop([id_column, timestamp_column]):
-        col_dtype = df[col].dtype
-        if col in target_columns and not pd.api.types.is_numeric_dtype(df[col]):
-            raise ValueError(f"All target columns must be numeric but got {col=} with dtype={col_dtype}")
-
-        if (
-            pd.api.types.is_object_dtype(df[col])
-            or pd.api.types.is_string_dtype(df[col])
-            or isinstance(col_dtype, pd.CategoricalDtype)
-        ):
-            astype_dict[col] = "category"
-        elif pd.api.types.is_numeric_dtype(df[col]) or pd.api.types.is_bool_dtype(df[col]):
-            astype_dict[col] = "float32"
-        else:
-            raise ValueError(
-                f"All columns must contain numeric, object, category, string, or bool dtype but got {col=} with dtype={col_dtype}"
-            )
-
-        if future_df is not None and col in future_df.columns:
-            if future_df[col].dtype != col_dtype:
-                raise ValueError(
-                    f"Column {col} in future_df has dtype {future_df[col].dtype} but column in df has dtype {col_dtype}"
-                )
-            future_astype_dict[col] = astype_dict[col]
-
-    df = df.astype(astype_dict, copy=True)
-    if future_df is not None:
-        future_df = future_df.astype(future_astype_dict, copy=True)
-
-    return df, future_df
-
-
-def validate_df_inputs(
-    df: "pd.DataFrame",
-    future_df: "pd.DataFrame | None",
-    target_columns: list[str],
-    prediction_length: int,
-    id_column: str = "item_id",
-    timestamp_column: str = "timestamp",
-) -> tuple["pd.DataFrame", "pd.DataFrame | None", "pd.Timedelta", list[int], list[int] | None, np.ndarray]:
-    """
-    Validates and prepares dataframe inputs passed to `Chronos2Pipeline.predict_df`.
-
-    Parameters
-    ----------
-    df
-        Input dataframe containing time series data with columns:
-        - id_column: Identifier for each time series
-        - timestamp_column: Timestamps for each observation
-        - target_columns: One or more target variables to forecast
-        - Additional columns are treated as covariates
-    future_df
-        Optional dataframe containing future covariate values with columns:
-        - id_column: Identifier for each time series
-        - timestamp_column: Future timestamps
-        - Subset of covariate columns from df
-    target_columns
-        Names of target columns to forecast
-    prediction_length
-        Number of future time steps to predict
-    id_column
-        Name of column containing time series identifiers
-    timestamp_column
-        Name of column containing timestamps
-
-    Returns
-    -------
-    A tuple containing:
-    - Validated and sorted input dataframe
-    - Validated and sorted future dataframe (if provided)
-    - Inferred frequency of the time series
-    - List of series lengths from input dataframe
-    - List of series lengths from future dataframe (if provided)
-    - Original order of time series IDs
-
-    Raises
-    ------
-    ValueError
-        If validation fails for:
-        - Missing required columns
-        - Invalid data types
-        - Inconsistent frequencies
-        - Insufficient data points
-        - Mismatched series between df and future_df
-        - Invalid future_df lengths
-    """
-
-    import pandas as pd
-
-    required_cols = [id_column, timestamp_column] + target_columns
-    missing_cols = [col for col in required_cols if col not in df.columns]
-    if missing_cols:
-        raise ValueError(f"df does not contain all expected columns. Missing columns: {missing_cols}")
-
-    if future_df is not None:
-        future_required_cols = [id_column, timestamp_column]
-        missing_future_cols = [col for col in future_required_cols if col not in future_df.columns]
-        targets_in_future = [col for col in future_df.columns if col in target_columns]
-        extra_future_cols = [col for col in future_df.columns if col not in df.columns]
-        if missing_future_cols:
-            raise ValueError(
-                f"future_df does not contain all expected columns. Missing columns: {missing_future_cols}"
-            )
-        if targets_in_future:
-            raise ValueError(
-                f"future_df cannot contain target columns. Target columns found in future_df: {targets_in_future}"
-            )
-        if extra_future_cols:
-            raise ValueError(f"future_df cannot contain columns not present in df. Extra columns: {extra_future_cols}")
-
-    df, future_df = _validate_df_types_and_cast(
-        df, future_df, id_column=id_column, timestamp_column=timestamp_column, target_columns=target_columns
-    )
-
-    # Get the original order of time series IDs
-    original_order = df[id_column].unique()
-
-    # Sort and prepare df
-    df[timestamp_column] = pd.to_datetime(df[timestamp_column])
-    df = df.sort_values([id_column, timestamp_column])
-
-    # Get series lengths
-    series_lengths = df[id_column].value_counts(sort=False).to_list()
-
-    def validate_freq(timestamps: pd.Series, series_id: str):
-        freq = pd.infer_freq(timestamps)
-        if not freq:
-            raise ValueError(f"Could not infer frequency for series {series_id}")
-        return freq
-
-    # Validate each series
-    all_freqs = []
-    start_idx = 0
-    for length in series_lengths:
-        if length < 3:
-            series_id = df.iloc[start_idx][id_column]
-            raise ValueError(
-                f"Every time series must have at least 3 data points, found {length=} for series {series_id}"
-            )
-
-        series_data = df.iloc[start_idx : start_idx + length]
-        timestamps = series_data[timestamp_column]
-        series_id = series_data.iloc[0][id_column]
-        all_freqs.append(validate_freq(timestamps, series_id))
-        start_idx += length
-
-    if len(set(all_freqs)) > 1:
-        raise ValueError("All time series must have the same frequency")
-
-    inferred_freq = all_freqs[0]
-
-    # Sort future_df if provided and validate its series lengths
-    future_series_lengths = None
-    if future_df is not None:
-        future_df[timestamp_column] = pd.to_datetime(future_df[timestamp_column])
-        future_df = future_df.sort_values([id_column, timestamp_column])
-
-        # Validate that future_df contains all series from df
-        context_ids = set(df[id_column].unique())
-        future_ids = set(future_df[id_column].unique())
-        if context_ids != future_ids:
-            raise ValueError("future_df must contain the same time series IDs as df")
-
-        future_series_lengths = future_df[id_column].value_counts(sort=False).to_list()
-
-        # Validate future series lengths match prediction_length
-        future_start_idx = 0
-        for future_length in future_series_lengths:
-            future_series_data = future_df.iloc[future_start_idx : future_start_idx + future_length]
-            future_timestamps = future_series_data[timestamp_column]
-            future_series_id = future_series_data.iloc[0][id_column]
-            if future_length != prediction_length:
-                raise ValueError(
-                    f"Future covariates all time series must have length {prediction_length}, got {future_length} for series {future_series_id}"
-                )
-            if future_length < 3 or inferred_freq != validate_freq(future_timestamps, future_series_id):
-                raise ValueError(
-                    f"Future covariates must have the same frequency as context, found series {future_series_id} with a different frequency"
-                )
-            future_start_idx += future_length
-
-        assert len(series_lengths) == len(future_series_lengths)
-
-    return df, future_df, inferred_freq, series_lengths, future_series_lengths, original_order
-
-
-def convert_df_input_to_list_of_dicts_input(
-    df: "pd.DataFrame",
-    future_df: "pd.DataFrame | None",
-    target_columns: list[str],
-    prediction_length: int,
-    id_column: str = "item_id",
-    timestamp_column: str = "timestamp",
-) -> tuple[list[dict[str, np.ndarray | dict[str, np.ndarray]]], np.ndarray, dict[str, "pd.DatetimeIndex"]]:
-    """
-    Convert from dataframe input format to a list of dictionaries input format.
-
-    Parameters
-    ----------
-    df
-        Input dataframe containing time series data with columns:
-        - id_column: Identifier for each time series
-        - timestamp_column: Timestamps for each observation
-        - target_columns: One or more target variables to forecast
-        - Additional columns are treated as covariates
-    future_df
-        Optional dataframe containing future covariate values with columns:
-        - id_column: Identifier for each time series
-        - timestamp_column: Future timestamps
-        - Subset of covariate columns from df
-    target_columns
-        Names of target columns to forecast
-    prediction_length
-        Number of future time steps to predict
-    id_column
-        Name of column containing time series identifiers
-    timestamp_column
-        Name of column containing timestamps
-
-    Returns
-    -------
-    A tuple containing:
-    - List of dictionaries in the format expected by `Chronos2Pipeline.predict`
-    - Original order of time series IDs
-    - Dictionary mapping series IDs to future time index
-    """
-
-    import pandas as pd
-
-    df, future_df, freq, series_lengths, future_series_lengths, original_order = validate_df_inputs(
-        df,
-        future_df=future_df,
-        id_column=id_column,
-        timestamp_column=timestamp_column,
-        target_columns=target_columns,
-        prediction_length=prediction_length,
-    )
-
-    # Convert to list of dicts format
-    inputs: list[dict[str, np.ndarray | dict[str, np.ndarray]]] = []
-    prediction_timestamps: dict[str, pd.DatetimeIndex] = {}
-    start_idx: int = 0
-    future_start_idx: int = 0
-
-    for i, length in enumerate(series_lengths):
-        series_data = df.iloc[start_idx : start_idx + length]
-        # Extract target(s)
-        target_data = series_data[target_columns].to_numpy().T  # Shape: (n_targets, history_length)
-        task: dict[str, np.ndarray | dict[str, np.ndarray]] = {"target": target_data}
-
-        # Generate future timestamps
-        series_id = series_data.iloc[0][id_column]
-        last_timestamp = series_data[timestamp_column].iloc[-1]
-        future_ts = pd.date_range(start=last_timestamp, periods=prediction_length + 1, freq=freq)[1:]
-        prediction_timestamps[series_id] = future_ts
-
-        # Handle covariates if present
-        covariate_cols = [
-            col for col in series_data.columns if col not in [id_column, timestamp_column] + target_columns
-        ]
-
-        if covariate_cols:
-            past_covariates = {col: series_data[col].to_numpy() for col in covariate_cols}
-            task["past_covariates"] = past_covariates
-
-            # Handle future covariates
-            if future_df is not None:
-                assert future_series_lengths is not None
-                future_length = future_series_lengths[i]
-                future_data = future_df.iloc[future_start_idx : future_start_idx + future_length]
-                assert future_data[timestamp_column].iloc[0] == future_ts[0], (
-                    f"the first timestamp in future_df must be the first forecast timestamp, found mismatch "
-                    f"({future_data[timestamp_column].iloc[0]} != {future_ts[0]}) in series {series_id}"
-                )
-
-                if len(future_data) > 0:
-                    future_covariates = {
-                        col: future_data[col].to_numpy() for col in covariate_cols if col in future_data.columns
-                    }
-                    if future_covariates:
-                        task["future_covariates"] = future_covariates
-                future_start_idx += future_length
-
-        inputs.append(task)
-        start_idx += length
-
-    assert len(inputs) == len(series_lengths)
-
-    return inputs, original_order, prediction_timestamps
-
-
 def _cast_fev_features(
    past_data: "datasets.Dataset",
    future_data: "datasets.Dataset",
@ -676,49 +476,65 @@ class Chronos2Dataset(IterableDataset):
    Arguments
    ----------
    inputs
-        Time series data. Must be a list of dictionaries where each dictionary may have the following keys.
-        - `target` (required): a 1-d or 2-d `torch.Tensor` or `np.ndarray` of shape (history_length,) or (n_variates, history_length).
-        Forecasts will be generated for items in `target`.
-        - `past_covariates` (optional): a dict of past-only covariates or past values of known future covariates. The keys of the dict
-        must be names of the covariates and values must be 1-d `torch.Tensor` or `np.ndarray` with length equal to the `history_length`
-        of `target`.
-        - `future_covariates` (optional): a dict of future values of known future covariates. The keys of the dict must be names of the
-        covariates and values must be 1-d `torch.Tensor` or `np.ndarray` with length equal to the `prediction_length`. All keys in
-        `future_covariates` must be a subset of the keys in `past_covariates`.
-        Note: when the mode is set to TRAIN, the values inside `future_covariates` are not technically used for training the model;
-        however, this key is used to infer which covariates are known into the future. Therefore, if your task contains known future covariates,
-        make sure that this key exists in `inputs`. The values of individual future covariates may be set to `None` or an empty array.
+        Time series data. Can be either:
+
+        1. Raw inputs (when `convert_inputs=True`, default): A sequence of dictionaries where each
+           dictionary may have the following keys:
+           - `target` (required): a 1-d or 2-d `torch.Tensor` or `np.ndarray` of shape (history_length,)
+             or (n_variates, history_length). Forecasts will be generated for items in `target`.
+           - `past_covariates` (optional): a dict of past-only covariates or past values of known future
+             covariates.
+           - `future_covariates` (optional): a dict of future values of known future covariates.
+
+        2. Pre-processed inputs (when `convert_inputs=False`): A sequence of `PreparedInput` dicts with keys:
+           `context`, `future_covariates`, `n_targets`, `n_covariates`, `n_future_covariates`.
+           Use `prepare_inputs()` to create pre-processed inputs.
    context_length
        The maximum context length used for training or inference
    prediction_length
        The prediction horizon
    batch_size
-        The batch size for training the model. Note that the batch size here means the number of time series, including target(s) and
-        covariates, that are input into the model. If your data has multiple target and/or covariates, the effective number of time series
-        tasks in a batch will be lower than this value.
+        The batch size for training the model. Note that the batch size here means the number of time series,
+        including target(s) and covariates, that are input into the model.
    output_patch_size
-        The output patch size of the model. This is used to compute the number of patches needed to cover `prediction_length`
+        The output patch size of the model. This is used to compute the number of patches needed to cover
+        `prediction_length`
    min_past
-        The minimum number of time steps the context must have during training. All time series shorter than `min_past + prediction_length`
-        are filtered out, by default 1
+        The minimum number of time steps the context must have during training. All time series shorter than
+        `min_past + prediction_length` are filtered out, by default 1
    mode
        `DatasetMode` governing whether to generate training, validation or test samples, by default "train"
+    convert_inputs
+        If True (default), preprocess raw inputs. If False, inputs are expected to be already preprocessed.
    """

    def __init__(
        self,
-        inputs: Sequence[Mapping[str, TensorOrArray | Mapping[str, TensorOrArray | None]]],
+        inputs: TensorOrArray | Sequence[TensorOrArray] | Sequence[Mapping[str, Any]] | Sequence[PreparedInput],
        context_length: int,
        prediction_length: int,
        batch_size: int,
        output_patch_size: int,
        min_past: int = 1,
        mode: str | DatasetMode = DatasetMode.TRAIN,
+        convert_inputs: bool = True,
    ) -> None:
        super().__init__()
        assert mode in {DatasetMode.TRAIN, DatasetMode.VALIDATION, DatasetMode.TEST}, f"Invalid mode: {mode}"

-        self.tasks = Chronos2Dataset._prepare_tasks(inputs, prediction_length, min_past, mode)
+        self.inputs: Sequence[PreparedInput]
+        if convert_inputs:
+            if isinstance(inputs, (torch.Tensor, np.ndarray)):
+                inputs = convert_tensor_input_to_list_of_dicts_input(inputs)
+            elif (
+                isinstance(inputs, Sequence) and len(inputs) > 0 and isinstance(inputs[0], (torch.Tensor, np.ndarray))
+            ):
+                inputs = convert_list_of_tensors_input_to_list_of_dicts_input(cast(Sequence[TensorOrArray], inputs))
+            self.inputs = prepare_inputs(cast(Iterable[Mapping[str, Any]], inputs), prediction_length, min_past, mode)
+        else:
+            validate_prepared_schema(inputs[0])
+            self.inputs = cast(Sequence[PreparedInput], inputs)
+
        self.context_length = context_length
        self.prediction_length = prediction_length
        self.batch_size = batch_size
@ -726,53 +542,16 @@ class Chronos2Dataset(IterableDataset):
        self.min_past = min_past
        self.mode = mode

-    @staticmethod
-    def _prepare_tasks(
-        inputs: Sequence[Mapping[str, TensorOrArray | Mapping[str, TensorOrArray | None]]],
-        prediction_length: int,
-        min_past: int,
-        mode: str | DatasetMode,
-    ):
-        tasks = []
-        for idx, raw_task in enumerate(inputs):
-            if mode != DatasetMode.TEST:
-                raw_future_covariates = raw_task.get("future_covariates", {})
-                raw_future_covariates = cast(dict[str, TensorOrArray | None], raw_future_covariates)
-                if raw_future_covariates:
-                    fixed_future_covariates = {}
-                    for key, value in raw_future_covariates.items():
-                        fixed_future_covariates[key] = (
-                            np.full(prediction_length, np.nan) if value is None or len(value) == 0 else value
-                        )
-                    raw_task = {**raw_task, "future_covariates": fixed_future_covariates}
+    def _construct_slice(self, input_idx: int) -> tuple[torch.Tensor, torch.Tensor | None, torch.Tensor, int]:
+        prepared = self.inputs[input_idx]
+        past_tensor = prepared["context"].clone()  # shape: (n_targets + n_covariates, history_length)
+        future_tensor = prepared["future_covariates"].clone()
+        n_targets = int(prepared["n_targets"])
+        n_covariates = int(prepared["n_covariates"])
+        n_future_covariates = int(prepared["n_future_covariates"])
+        n_past_only_covariates = n_covariates - n_future_covariates

-            raw_task = cast(dict[str, TensorOrArray | Mapping[str, TensorOrArray]], raw_task)
-            # convert to a format compatible with model's forward
-            task = validate_and_prepare_single_dict_task(raw_task, idx, prediction_length)
-
-            if mode != DatasetMode.TEST and task[0].shape[-1] < min_past + prediction_length:
-                # filter tasks based on min_past + prediction_length
-                continue
-            tasks.append(task)
-
-        if len(tasks) == 0:
-            raise ValueError(
-                "The dataset is empty after filtering based on the length of the time series (length >= min_past + prediction_length). "
-                "Please provide longer time series or reduce `min_past` or `prediction_length`. "
-            )
-        return tasks
-
-    def _construct_slice(self, task_idx: int) -> tuple[torch.Tensor, torch.Tensor | None, torch.Tensor, int]:
-        (
-            task_past_tensor,  # shape:  (task_n_targets + task_n_covariates, history_length)
-            task_future_tensor,
-            task_n_targets,
-            task_n_covariates,
-            task_n_future_covariates,
-        ) = self.tasks[task_idx]
-        task_n_past_only_covariates = task_n_covariates - task_n_future_covariates
-
-        full_length = task_past_tensor.shape[-1]
+        full_length = past_tensor.shape[-1]

        if self.mode == DatasetMode.TRAIN:
            # slice a random subsequence from the full series
@ -786,72 +565,74 @@ class Chronos2Dataset(IterableDataset):

        if slice_idx >= self.context_length:
            # slice series, if it is longer than context_length
-            task_context = task_past_tensor[:, slice_idx - self.context_length : slice_idx]
+            context = past_tensor[:, slice_idx - self.context_length : slice_idx]
        else:
-            task_context = task_past_tensor[:, :slice_idx]
+            context = past_tensor[:, :slice_idx]

-        # In the TEST mode, we have no target available and the task_future_covariates can be directly used
-        # In the TRAIN and VALIDATION modes, the target and task_future_covariates need to be constructed from
-        # the task_context_tensor by slicing the appropriate indices which we do below
+        # In the TEST mode, we have no target available and the future_covariates can be directly used
+        # In the TRAIN and VALIDATION modes, the target and future_covariates need to be constructed from
+        # the context_tensor by slicing the appropriate indices which we do below
        if self.mode in [DatasetMode.TRAIN, DatasetMode.VALIDATION]:
-            # the first task_n_targets elements in task_context_tensor are the targets
-            task_future_target = task_past_tensor[:, slice_idx : slice_idx + self.prediction_length]
+            # the first n_targets elements in context_tensor are the targets
+            future_target = past_tensor[:, slice_idx : slice_idx + self.prediction_length].clone()
+            # mask out all rows corresponding to covariates
+            future_target[n_targets:] = torch.nan

-            if task_n_future_covariates > 0:
-                # the last task_n_future_covariates elements in task_context_tensor are the known covariates
-                task_future_covariates = task_past_tensor[
-                    -task_n_future_covariates:, slice_idx : slice_idx + self.prediction_length
+            if n_future_covariates > 0:
+                # the last n_future_covariates elements in context_tensor are the known covariates
+                future_covariates = past_tensor[
+                    -n_future_covariates:, slice_idx : slice_idx + self.prediction_length
                ]
            else:
                # zero-length tensor for easy concatenation later
-                task_future_covariates = torch.zeros((0, self.prediction_length))
+                future_covariates = torch.zeros((0, self.prediction_length))

-            # the leading task_n_targets + task_n_past_only_covariates elements are masked because the target(s)
+            # the leading n_targets + n_past_only_covariates elements are masked because the target(s)
            # and past-only covariates are not known into the future
-            task_future_covariates_padding = torch.full(
-                (task_n_targets + task_n_past_only_covariates, self.prediction_length),
+            future_covariates_padding = torch.full(
+                (n_targets + n_past_only_covariates, self.prediction_length),
                fill_value=torch.nan,
            )
-            task_future_covariates = torch.cat([task_future_covariates_padding, task_future_covariates], dim=0)
+            future_covariates = torch.cat([future_covariates_padding, future_covariates], dim=0)
        else:
-            task_future_target = None
-            task_future_covariates = task_future_tensor
+            future_target = None
+            future_covariates = future_tensor

-        # task_context: (task_n_targets + task_n_covariates, min(context_length, history_length))
-        # task_future_target: (task_n_targets + task_n_covariates, prediction_length), the future values of known future covariates
+        # context: (n_targets + n_covariates, min(context_length, history_length))
+        # future_target: (n_targets + n_covariates, prediction_length), the future values of known future covariates
        # are ignored during loss computation
-        # task_future_covariates: (task_n_targets + task_n_past_only_covariates + task_n_future_covariates, prediction_length),
+        # future_covariates: (n_targets + n_past_only_covariates + n_future_covariates, prediction_length),
        # the entries corresponding to targets and past-only covariates are NaNs

-        return task_context, task_future_target, task_future_covariates, task_n_targets
+        return context, future_target, future_covariates, n_targets

-    def _build_batch(self, task_indices: list[int]) -> dict[str, torch.Tensor | int | list[tuple[int, int]] | None]:
-        """Build a batch from given task indices."""
-        batch_context_tensor_list = []
-        batch_future_target_tensor_list = []
-        batch_future_covariates_tensor_list = []
+    def _build_batch(self, input_indices: list[int]) -> dict[str, torch.Tensor | int | list[tuple[int, int]] | None]:
+        """Build a batch from given input indices."""
+        batch_context_list = []
+        batch_future_target_list = []
+        batch_future_covariates_list = []
        batch_group_ids_list = []
        target_idx_ranges: list[tuple[int, int]] = []

        target_start_idx = 0
-        for group_id, task_idx in enumerate(task_indices):
-            task_context, task_future_target, task_future_covariates, task_n_targets = self._construct_slice(task_idx)
+        for group_id, input_idx in enumerate(input_indices):
+            context, future_target, future_covariates, n_targets = self._construct_slice(input_idx)

-            group_size = task_context.shape[0]
-            task_group_ids = torch.full((group_size,), fill_value=group_id)
-            batch_context_tensor_list.append(task_context)
-            batch_future_target_tensor_list.append(task_future_target)
-            batch_future_covariates_tensor_list.append(task_future_covariates)
-            batch_group_ids_list.append(task_group_ids)
-            target_idx_ranges.append((target_start_idx, target_start_idx + task_n_targets))
+            group_size = context.shape[0]
+            group_ids = torch.full((group_size,), fill_value=group_id)
+            batch_context_list.append(context)
+            batch_future_target_list.append(future_target)
+            batch_future_covariates_list.append(future_covariates)
+            batch_group_ids_list.append(group_ids)
+            target_idx_ranges.append((target_start_idx, target_start_idx + n_targets))
            target_start_idx += group_size

        return {
-            "context": left_pad_and_cat_2D(batch_context_tensor_list),
+            "context": left_pad_and_cat_2D(batch_context_list),
            "future_target": None
            if self.mode == DatasetMode.TEST
-            else torch.cat(cast(list[torch.Tensor], batch_future_target_tensor_list), dim=0),
-            "future_covariates": torch.cat(batch_future_covariates_tensor_list, dim=0),
+            else torch.cat(cast(list[torch.Tensor], batch_future_target_list), dim=0),
+            "future_covariates": torch.cat(batch_future_covariates_list, dim=0),
            "group_ids": torch.cat(batch_group_ids_list, dim=0),
            "num_output_patches": self.num_output_patches,
            "target_idx_ranges": target_idx_ranges,
@ -860,27 +641,27 @@ class Chronos2Dataset(IterableDataset):
    def _generate_train_batches(self):
        while True:
            current_batch_size = 0
-            task_indices = []
+            input_indices = []

            while current_batch_size < self.batch_size:
-                task_idx = np.random.randint(len(self.tasks))
-                task_indices.append(task_idx)
-                current_batch_size += self.tasks[task_idx][0].shape[0]
+                input_idx = np.random.randint(len(self.inputs))
+                input_indices.append(input_idx)
+                current_batch_size += self.inputs[input_idx]["context"].shape[0]

-            yield self._build_batch(task_indices)
+            yield self._build_batch(input_indices)

    def _generate_sequential_batches(self):
-        task_idx = 0
-        while task_idx < len(self.tasks):
+        input_idx = 0
+        while input_idx < len(self.inputs):
            current_batch_size = 0
-            task_indices = []
+            input_indices = []

-            while task_idx < len(self.tasks) and current_batch_size < self.batch_size:
-                task_indices.append(task_idx)
-                current_batch_size += self.tasks[task_idx][0].shape[0]
-                task_idx += 1
+            while input_idx < len(self.inputs) and current_batch_size < self.batch_size:
+                input_indices.append(input_idx)
+                current_batch_size += self.inputs[input_idx]["context"].shape[0]
+                input_idx += 1

-            yield self._build_batch(task_indices)
+            yield self._build_batch(input_indices)

    def __iter__(self) -> Iterator:
        """
@ -907,39 +688,3 @@ class Chronos2Dataset(IterableDataset):
                yield batch
        else:
            yield from self._generate_sequential_batches()
-
-    @classmethod
-    def convert_inputs(
-        cls,
-        inputs: TensorOrArray
-        | Sequence[TensorOrArray]
-        | Sequence[Mapping[str, TensorOrArray | Mapping[str, TensorOrArray | None]]],
-        context_length: int,
-        prediction_length: int,
-        batch_size: int,
-        output_patch_size: int,
-        min_past: int = 1,
-        mode: str | DatasetMode = DatasetMode.TRAIN,
-    ) -> "Chronos2Dataset":
-        """Convert from different input formats to a Chronos2Dataset."""
-        if isinstance(inputs, (torch.Tensor, np.ndarray)):
-            inputs = convert_tensor_input_to_list_of_dicts_input(inputs)
-        elif isinstance(inputs, list) and all([isinstance(x, (torch.Tensor, np.ndarray)) for x in inputs]):
-            inputs = cast(list[TensorOrArray], inputs)
-            inputs = convert_list_of_tensors_input_to_list_of_dicts_input(inputs)
-        elif isinstance(inputs, list) and all([isinstance(x, dict) for x in inputs]):
-            pass
-        else:
-            raise ValueError("Unexpected inputs format")
-
-        inputs = cast(list[dict[str, TensorOrArray | dict[str, TensorOrArray]]], inputs)
-
-        return cls(
-            inputs,
-            context_length=context_length,
-            prediction_length=prediction_length,
-            batch_size=batch_size,
-            output_patch_size=output_patch_size,
-            min_past=min_past,
-            mode=mode,
-        )
--- a/src/chronos/chronos2/layers.py
+++ b/src/chronos/chronos2/layers.py
@ -155,6 +155,7 @@ class MHA(nn.Module):
        self.n_heads: int = config.num_heads
        self.dropout: float = config.dropout_rate
        self.inner_dim: int = self.n_heads * self.kv_proj_dim
+        self.config = config

        self.q = nn.Linear(self.d_model, self.inner_dim, bias=False)
        self.k = nn.Linear(self.d_model, self.inner_dim, bias=False)
@ -165,6 +166,64 @@ class MHA(nn.Module):
        if use_rope:
            self.rope_embed = RoPE(dim=self.kv_proj_dim, base=config.rope_theta)

+    def _eager_attention(
+        self,
+        query_states: torch.Tensor,
+        key_states: torch.Tensor,
+        value_states: torch.Tensor,
+        mask: torch.Tensor,
+    ) -> tuple[torch.Tensor, torch.Tensor]:
+        """Eager attention implementation using manual matmul.
+
+        Args:
+            query_states: [batch, n_heads, seq_len, kv_proj_dim]
+            key_states: [batch, n_heads, seq_len, kv_proj_dim]
+            value_states: [batch, n_heads, seq_len, kv_proj_dim]
+            mask: [batch, n_heads, q_len, kv_len]
+
+        Returns:
+            attn_output: [batch, n_heads, seq_len, kv_proj_dim]
+            attn_weights: [batch, n_heads, q_len, kv_len]
+        """
+        # Compute attention weights (no scaling - this is the original Chronos-2 implementation)
+        scores = torch.matmul(query_states, key_states.transpose(3, 2))  # "bnqd,bnkd->bnqk"
+        scores += mask
+        attn_weights = nn.functional.softmax(scores.float(), dim=-1).type_as(scores)
+        attn_weights = nn.functional.dropout(attn_weights, p=self.dropout, training=self.training)
+        attn_output = torch.matmul(attn_weights, value_states)
+
+        return attn_output, attn_weights
+
+    def _sdpa_attention(
+        self,
+        query_states: torch.Tensor,
+        key_states: torch.Tensor,
+        value_states: torch.Tensor,
+        mask: torch.Tensor,
+    ) -> tuple[torch.Tensor, None]:
+        """SDPA attention implementation using torch.nn.functional.scaled_dot_product_attention.
+
+        Args:
+            query_states: [batch, n_heads, seq_len, kv_proj_dim]
+            key_states: [batch, n_heads, seq_len, kv_proj_dim]
+            value_states: [batch, n_heads, seq_len, kv_proj_dim]
+            mask: [batch, n_heads, q_len, kv_len] - additive mask (0 for valid, -inf for invalid)
+
+        Returns:
+            attn_output: [batch, n_heads, seq_len, kv_proj_dim]
+            attn_weights: None (SDPA doesn't return weights)
+        """
+        attn_output = nn.functional.scaled_dot_product_attention(
+            query_states,
+            key_states,
+            value_states,
+            attn_mask=mask,
+            dropout_p=self.dropout if self.training else 0.0,
+            scale=1.0,  # Match eager implementation (no scaling)
+        )
+
+        return attn_output, None
+
    def forward(
        self,
        hidden_states: torch.Tensor,
@ -190,6 +249,11 @@ class MHA(nn.Module):
        if self.use_rope:
            assert position_ids is not None, "position_ids must be provided when self.use_rope=True"

+        # Force eager attention if output_attentions is True (only eager returns weights)
+        attn_implementation = self.config._attn_implementation
+        if output_attentions:
+            attn_implementation = "eager"
+
        seq_length = hidden_states.shape[1]

        def shape(states: torch.Tensor) -> torch.Tensor:
@ -215,12 +279,10 @@ class MHA(nn.Module):
                cos, sin = self.rope_embed(value_states, position_ids)
                query_states, key_states = RoPE.apply_rotary_pos_emb(query_states, key_states, cos, sin)

-        # Compute attention weights
-        scores = torch.matmul(query_states, key_states.transpose(3, 2))  # "bnqd,bnkd->bnqk"
-        scores += mask
-        attn_weights = nn.functional.softmax(scores.float(), dim=-1).type_as(scores)
-        attn_weights = nn.functional.dropout(attn_weights, p=self.dropout, training=self.training)
-        attn_output = torch.matmul(attn_weights, value_states)
+        if attn_implementation == "sdpa":
+            attn_output, attn_weights = self._sdpa_attention(query_states, key_states, value_states, mask)
+        else:  # eager
+            attn_output, attn_weights = self._eager_attention(query_states, key_states, value_states, mask)

        # Project attention output
        attn_output = unshape(attn_output)
--- a/src/chronos/chronos2/model.py
+++ b/src/chronos/chronos2/model.py
@ -199,6 +199,7 @@ class Chronos2Model(PreTrainedModel):
    config_class = Chronos2CoreConfig  # type: ignore[assignment]
    _supports_long_horizon: bool = True
    _supports_future_covariates: bool = True
+    _supports_sdpa: bool = True

    def __init__(self, config: Chronos2CoreConfig):
        assert hasattr(config, "chronos_config"), "Not a valid Chronos config"
@ -546,6 +547,74 @@ class Chronos2Model(PreTrainedModel):

        return loss

+    def encode(
+        self,
+        context: torch.Tensor,
+        context_mask: torch.Tensor | None = None,
+        group_ids: torch.Tensor | None = None,
+        future_covariates: torch.Tensor | None = None,
+        future_covariates_mask: torch.Tensor | None = None,
+        num_output_patches: int = 1,
+        future_target: torch.Tensor | None = None,
+        future_target_mask: torch.Tensor | None = None,
+        output_attentions: bool = False,
+    ):
+        self._validate_input(
+            context=context,
+            context_mask=context_mask,
+            future_covariates=future_covariates,
+            future_covariates_mask=future_covariates_mask,
+            group_ids=group_ids,
+            num_output_patches=num_output_patches,
+            future_target=future_target,
+            future_target_mask=future_target_mask,
+        )
+
+        batch_size = context.shape[0]
+        patched_context, attention_mask, loc_scale = self._prepare_patched_context(
+            context=context, context_mask=context_mask
+        )
+        num_context_patches = attention_mask.shape[-1]
+
+        # get input embeddings of shape (batch, num_context_patches, d_model)
+        input_embeds: torch.Tensor = self.input_patch_embedding(patched_context)
+        # append [REG] special token embedding, if needed
+        if self.chronos_config.use_reg_token:
+            reg_input_ids = torch.full((batch_size, 1), self.config.reg_token_id, device=input_embeds.device)
+            reg_embeds = self.shared(reg_input_ids)
+            input_embeds = torch.cat([input_embeds, reg_embeds], dim=-2)
+            attention_mask = torch.cat(
+                [attention_mask.to(self.dtype), torch.ones_like(reg_input_ids).to(self.dtype)], dim=-1
+            )
+
+        patched_future, patched_future_covariates_mask = self._prepare_patched_future(
+            future_covariates=future_covariates,
+            future_covariates_mask=future_covariates_mask,
+            loc_scale=loc_scale,
+            num_output_patches=num_output_patches,
+            batch_size=batch_size,
+        )
+        future_attention_mask = torch.ones(batch_size, num_output_patches, dtype=self.dtype, device=self.device)
+
+        # get future embeddings of shape (batch, num_output_patches, d_model)
+        future_embeds: torch.Tensor = self.input_patch_embedding(patched_future)
+
+        # concatenate context and future embeddings and masks
+        input_embeds = torch.cat([input_embeds, future_embeds], dim=-2)
+        attention_mask = torch.cat([attention_mask, future_attention_mask], dim=-1)
+
+        if group_ids is None:
+            # by default, each time series is treated independently, i.e., no mixing across the batch
+            group_ids = torch.arange(batch_size, dtype=torch.long, device=self.device)
+
+        encoder_outputs: Chronos2EncoderOutput = self.encoder(
+            attention_mask=attention_mask,
+            inputs_embeds=input_embeds,
+            group_ids=group_ids,
+            output_attentions=output_attentions,
+        )
+        return encoder_outputs, loc_scale, patched_future_covariates_mask, num_context_patches
+
    def forward(
        self,
        context: torch.Tensor,
@ -624,63 +693,19 @@ class Chronos2Model(PreTrainedModel):
        - enc_time_self_attn_weights: Time self attention weights, if output_attentions=True
        - enc_group_self_attn_weights: Group self attention weights, if output_attentions=True
        """
-
-        self._validate_input(
+        batch_size = context.shape[0]
+        encoder_outputs, loc_scale, patched_future_covariates_mask, num_context_patches = self.encode(
            context=context,
            context_mask=context_mask,
+            group_ids=group_ids,
            future_covariates=future_covariates,
            future_covariates_mask=future_covariates_mask,
-            group_ids=group_ids,
            num_output_patches=num_output_patches,
            future_target=future_target,
            future_target_mask=future_target_mask,
-        )
-
-        batch_size = context.shape[0]
-        patched_context, attention_mask, loc_scale = self._prepare_patched_context(
-            context=context, context_mask=context_mask
-        )
-        num_context_patches = attention_mask.shape[-1]
-
-        # get input embeddings of shape (batch, num_context_patches, d_model)
-        input_embeds: torch.Tensor = self.input_patch_embedding(patched_context)
-        # append [REG] special token embedding, if needed
-        if self.chronos_config.use_reg_token:
-            reg_input_ids = torch.full((batch_size, 1), self.config.reg_token_id, device=input_embeds.device)
-            reg_embeds = self.shared(reg_input_ids)
-            input_embeds = torch.cat([input_embeds, reg_embeds], dim=-2)
-            attention_mask = torch.cat(
-                [attention_mask.to(self.dtype), torch.ones_like(reg_input_ids).to(self.dtype)], dim=-1
-            )
-
-        patched_future, patched_future_covariates_mask = self._prepare_patched_future(
-            future_covariates=future_covariates,
-            future_covariates_mask=future_covariates_mask,
-            loc_scale=loc_scale,
-            num_output_patches=num_output_patches,
-            batch_size=batch_size,
-        )
-        future_attention_mask = torch.ones(batch_size, num_output_patches, dtype=self.dtype, device=self.device)
-
-        # get future embeddings of shape (batch, num_output_patches, d_model)
-        future_embeds: torch.Tensor = self.input_patch_embedding(patched_future)
-
-        # concatenate context and future embeddings and masks
-        input_embeds = torch.cat([input_embeds, future_embeds], dim=-2)
-        attention_mask = torch.cat([attention_mask, future_attention_mask], dim=-1)
-
-        if group_ids is None:
-            # by default, each time series is treated independently, i.e., no mixing across the batch
-            group_ids = torch.arange(batch_size, dtype=torch.long, device=self.device)
-
-        encoder_outputs: Chronos2EncoderOutput = self.encoder(
-            attention_mask=attention_mask,
-            inputs_embeds=input_embeds,
-            group_ids=group_ids,
            output_attentions=output_attentions,
        )
        hidden_states: torch.Tensor = encoder_outputs[0]
-
        assert hidden_states.shape == (batch_size, num_context_patches + 1 + num_output_patches, self.model_dim)

        # slice the last num_output_patches hidden states to be input into the output_patch_embedding
--- a/src/chronos/chronos2/pipeline.py
+++ b/src/chronos/chronos2/pipeline.py
@ -9,29 +9,29 @@ import time
 import warnings
 from copy import deepcopy
 from pathlib import Path
-from typing import TYPE_CHECKING, Any, Mapping, Sequence
+from typing import TYPE_CHECKING, Callable, Literal, Mapping, Sequence

 import numpy as np
 import torch
 from einops import rearrange, repeat
 from torch.utils.data import DataLoader
 from transformers import AutoConfig
+from transformers.utils.import_utils import is_peft_available
+from transformers.utils.peft_utils import find_adapter_config_file

 import chronos.chronos2
 from chronos.base import BaseChronosPipeline, ForecastType
 from chronos.chronos2 import Chronos2Model
-from chronos.chronos2.dataset import (
-    Chronos2Dataset,
-    DatasetMode,
-    TensorOrArray,
-    convert_df_input_to_list_of_dicts_input,
-)
+from chronos.chronos2.dataset import Chronos2Dataset, DatasetMode, TensorOrArray
+from chronos.df_utils import convert_df_input_to_list_of_dicts_input
 from chronos.utils import interpolate_quantiles, weighted_quantile

 if TYPE_CHECKING:
    import datasets
    import fev
    import pandas as pd
+    from peft import LoraConfig
+    from transformers.trainer_callback import TrainerCallback

 logger = logging.getLogger(__name__)

@ -103,13 +103,19 @@ class Chronos2Pipeline(BaseChronosPipeline):
        | Sequence[TensorOrArray]
        | Sequence[Mapping[str, TensorOrArray | Mapping[str, TensorOrArray | None]]]
        | None = None,
+        finetune_mode: Literal["full", "lora"] = "full",
+        lora_config: "LoraConfig | dict | None" = None,
        context_length: int | None = None,
-        learning_rate: float = 1e-5,
+        learning_rate: float = 1e-6,
        num_steps: int = 1000,
        batch_size: int = 256,
        output_dir: Path | str | None = None,
        min_past: int | None = None,
        finetuned_ckpt_name: str = "finetuned-ckpt",
+        callbacks: list["TrainerCallback"] | None = None,
+        remove_printer_callback: bool = False,
+        disable_data_parallel: bool = True,
+        convert_inputs: bool = True,
        **extra_trainer_kwargs,
    ) -> "Chronos2Pipeline":
        """
@ -127,10 +133,16 @@ class Chronos2Pipeline(BaseChronosPipeline):
        validation_inputs
            The time series used for validation and model selection. The format of `validation_inputs` is exactly the same as `inputs`, by default None which
            means that no validation is performed. Note that enabling validation may slow down fine-tuning for large datasets.
+        finetune_mode
+            One of "full" (performs full fine-tuning) or "lora" (performs Low Rank Adaptation (LoRA) fine-tuning), by default "full"
+        lora_config
+            The configuration to use for LoRA fine-tuning when finetune_mode="lora". Can be a `LoraConfig` object or a dict which is used to initialize `LoraConfig`.
+            When unspecified and finetune_mode="lora", a default configuration is used
        context_length
            The maximum context length used during fine-tuning, by default set to the model's default context length
        learning_rate
-            The learning rate for the optimizer, by default 1e-5
+            The learning rate for the optimizer, by default 1e-6
+            When finetune_mode="lora", we recommend using a higher value of the learning rate, such as 1e-5
        num_steps
            The number of steps to fine-tune for, by default 1000
        batch_size
@ -144,6 +156,16 @@ class Chronos2Pipeline(BaseChronosPipeline):
            are filtered out, by default set equal to prediction_length
        finetuned_ckpt_name
            The name of the directory inside `output_dir` in which the final fine-tuned checkpoint will be saved, by default "finetuned-ckpt"
+        callbacks
+            A list of `TrainerCallback`s which will be forwarded to the HuggingFace `Trainer`
+        remove_printer_callback
+            If True, all instances of `PrinterCallback` are removed from callbacks
+        disable_data_parallel
+            If True, ensures that DataParallel is disabled and training happens on a single GPU
+        convert_inputs
+            If True (default), preprocess raw inputs (convert tensors, encode categoricals, validate).
+            If False, inputs are expected to be already preprocessed using `chronos.chronos2.dataset.prepare_inputs`.
+            This allows for efficient training on large datasets that don't fit in memory.
        **extra_trainer_kwargs
            Extra kwargs are directly forwarded to `TrainingArguments`

@ -153,28 +175,66 @@ class Chronos2Pipeline(BaseChronosPipeline):
        """

        import torch.cuda
+        from transformers.trainer_callback import PrinterCallback
        from transformers.training_args import TrainingArguments

+        if finetune_mode == "lora":
+            if is_peft_available():
+                from peft import LoraConfig, get_peft_model
+            else:
+                warnings.warn(
+                    "`peft` is required for `finetune_mode='lora'`. Please install it with `pip install peft`. Falling back to `finetune_mode='full'`."
+                )
+                finetune_mode = "full"
+                lora_config = None
+
        from chronos.chronos2.trainer import Chronos2Trainer, EvaluateAndSaveFinalStepCallback

-        warnings.warn(
-            "Fine-tuning support is experimental and may be changed in future versions.",
-            category=FutureWarning,
-            stacklevel=2,
-        )
+        assert finetune_mode in ["full", "lora"], f"finetune_mode must be one of ['full', 'lora'], got {finetune_mode}"
+
+        if finetune_mode == "full" and lora_config is not None:
+            raise ValueError(
+                "lora_config should not be specified when `finetune_mode='full'`. To enable LoRA, set `finetune_mode='lora'`."
+            )

        # Create a copy of the model to avoid modifying the original
        config = deepcopy(self.model.config)
        model = Chronos2Model(config).to(self.model.device)  # type: ignore
        model.load_state_dict(self.model.state_dict())

+        if finetune_mode == "lora":
+            if lora_config is None:
+                lora_config = LoraConfig(
+                    r=8,
+                    lora_alpha=16,
+                    target_modules=[
+                        "self_attention.q",
+                        "self_attention.v",
+                        "self_attention.k",
+                        "self_attention.o",
+                        "output_patch_embedding.output_layer",
+                    ],
+                )
+            elif isinstance(lora_config, dict):
+                lora_config = LoraConfig(**lora_config)
+            else:
+                assert isinstance(lora_config, LoraConfig), (
+                    f"lora_config must be an instance of LoraConfig or a dict, got {type(lora_config)}"
+                )
+
+            model = get_peft_model(model, lora_config)
+            n_trainable_params, n_params = model.get_nb_trainable_parameters()
+            logger.info(
+                f"Using LoRA. Number of trainable parameters: {n_trainable_params}, total parameters: {n_params}."
+            )
+
        if context_length is None:
            context_length = self.model_context_length

        if min_past is None:
            min_past = prediction_length

-        train_dataset = Chronos2Dataset.convert_inputs(
+        train_dataset = Chronos2Dataset(
            inputs=inputs,
            context_length=context_length,
            prediction_length=prediction_length,
@ -182,6 +242,7 @@ class Chronos2Pipeline(BaseChronosPipeline):
            output_patch_size=self.model_output_patch_size,
            min_past=min_past,
            mode=DatasetMode.TRAIN,
+            convert_inputs=convert_inputs,
        )

        if output_dir is None:
@ -211,14 +272,13 @@ class Chronos2Pipeline(BaseChronosPipeline):
            lr_scheduler_type="linear",
            warmup_ratio=0.0,
            optim="adamw_torch_fused",
-            logging_dir=str(output_dir / "logs"),
            logging_strategy="steps",
            logging_steps=100,
            disable_tqdm=False,
            report_to="none",
            max_steps=num_steps,
            gradient_accumulation_steps=1,
-            dataloader_num_workers=1,
+            dataloader_num_workers=0,
            tf32=has_sm80 and not use_cpu,
            bf16=has_sm80 and not use_cpu,
            save_only_model=True,
@ -234,16 +294,16 @@ class Chronos2Pipeline(BaseChronosPipeline):
        )

        eval_dataset = None
-        callbacks = []
+        callbacks = callbacks or []
        if validation_inputs is not None:
-            # construct validation dataset
-            eval_dataset = Chronos2Dataset.convert_inputs(
+            eval_dataset = Chronos2Dataset(
                inputs=validation_inputs,
                context_length=context_length,
                prediction_length=prediction_length,
                batch_size=batch_size,
                output_patch_size=self.model_output_patch_size,
                mode=DatasetMode.VALIDATION,
+                convert_inputs=convert_inputs,
            )

            # set validation parameters
@ -268,6 +328,11 @@ class Chronos2Pipeline(BaseChronosPipeline):

        training_args = TrainingArguments(**training_kwargs)

+        if disable_data_parallel and not use_cpu:
+            # This is a hack to disable the default `transformers` behavior of using DataParallel
+            training_args._n_gpu = 1
+            assert training_args.n_gpu == 1  # Ensure that the hack worked
+
        trainer = Chronos2Trainer(
            model=model,
            args=training_args,
@ -275,12 +340,19 @@ class Chronos2Pipeline(BaseChronosPipeline):
            eval_dataset=eval_dataset,
            callbacks=callbacks,
        )
+
+        if remove_printer_callback:
+            trainer.pop_callback(PrinterCallback)
+
        trainer.train()

-        # update max_output_patches, if the model was fine-tuned with longer prediction_length
+        # update context_length and max_output_patches, if the model was fine-tuned with larger values
+        model.chronos_config.context_length = max(model.chronos_config.context_length, context_length)
        model.chronos_config.max_output_patches = max(
            model.chronos_config.max_output_patches, math.ceil(prediction_length / self.model_output_patch_size)
        )
+        # update chronos_config in model's config, so it is saved correctly
+        model.config.chronos_config = model.chronos_config.__dict__

        # Create a new pipeline with the fine-tuned model
        finetuned_pipeline = Chronos2Pipeline(model=model)
@ -389,7 +461,7 @@ class Chronos2Pipeline(BaseChronosPipeline):
        prediction_length: int | None = None,
        batch_size: int = 256,
        context_length: int | None = None,
-        predict_batches_jointly: bool = False,
+        cross_learning: bool = False,
        limit_prediction_length: bool = False,
        **kwargs,
    ) -> list[torch.Tensor]:
@ -475,7 +547,7 @@ class Chronos2Pipeline(BaseChronosPipeline):
            will be lower than this value, by default 256
        context_length
            The maximum context length used during for inference, by default set to the model's default context length
-        predict_batches_jointly
+        cross_learning
            If True, cross-learning is enabled, i.e., all the tasks in `inputs` will be predicted jointly and the model will share information across all inputs, by default False
            The following must be noted when using cross-learning:
            - Cross-learning doesn't always improve forecast accuracy and must be tested for individual use cases.
@ -495,6 +567,14 @@ class Chronos2Pipeline(BaseChronosPipeline):
        if prediction_length is None:
            prediction_length = model_prediction_length

+        if kwargs.get("predict_batches_jointly") is not None:
+            warnings.warn(
+                "The `predict_batches_jointly` argument is deprecated and will be removed in a future version. "
+                "Please use `cross_learning=True` to enable the cross-learning mode.",
+                category=FutureWarning,
+                stacklevel=2,
+            )
+            cross_learning = kwargs.pop("predict_batches_jointly")
        # The maximum number of output patches to generate in a single forward pass before the long-horizon heuristic kicks in. Note: A value larger
        # than the model's default max_output_patches may lead to degradation in forecast accuracy, defaults to a model-specific value
        max_output_patches = kwargs.pop("max_output_patches", self.max_output_patches)
@ -503,6 +583,8 @@ class Chronos2Pipeline(BaseChronosPipeline):
        # effective batch size increases by a factor of `len(unrolled_quantiles)` when making long-horizon predictions,
        # by default [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
        unrolled_quantiles = kwargs.pop("unrolled_quantiles", [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
+        # A callback which is called after each batch has been processed
+        after_batch_callback: Callable = kwargs.pop("after_batch", lambda: None)

        if len(kwargs) > 0:
            raise TypeError(f"Unexpected keyword arguments: {list(kwargs.keys())}.")
@ -534,8 +616,8 @@ class Chronos2Pipeline(BaseChronosPipeline):
            )
            context_length = self.model_context_length

-        test_dataset = Chronos2Dataset.convert_inputs(
-            inputs=inputs,
+        test_dataset = Chronos2Dataset(
+            inputs,
            context_length=context_length,
            prediction_length=prediction_length,
            batch_size=batch_size,
@ -543,7 +625,7 @@ class Chronos2Pipeline(BaseChronosPipeline):
            mode=DatasetMode.TEST,
        )
        test_loader = DataLoader(
-            test_dataset, batch_size=None, num_workers=1, pin_memory=True, shuffle=False, drop_last=False
+            test_dataset, batch_size=None, pin_memory=self.model.device.type == "cuda", shuffle=False, drop_last=False
        )

        all_predictions: list[torch.Tensor] = []
@ -554,7 +636,7 @@ class Chronos2Pipeline(BaseChronosPipeline):
            batch_future_covariates = batch["future_covariates"]
            batch_target_idx_ranges = batch["target_idx_ranges"]

-            if predict_batches_jointly:
+            if cross_learning:
                batch_group_ids = torch.zeros_like(batch_group_ids)

            batch_prediction = self._predict_batch(
@ -567,6 +649,7 @@ class Chronos2Pipeline(BaseChronosPipeline):
                target_idx_ranges=batch_target_idx_ranges,
            )
            all_predictions.extend(batch_prediction)
+            after_batch_callback()

        return all_predictions

@ -745,6 +828,10 @@ class Chronos2Pipeline(BaseChronosPipeline):
        prediction_length: int | None = None,
        quantile_levels: list[float] = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
        batch_size: int = 256,
+        context_length: int | None = None,
+        cross_learning: bool = False,
+        validate_inputs: bool = True,
+        freq: str | None = None,
        **predict_kwargs,
    ) -> "pd.DataFrame":
        """
@ -774,6 +861,24 @@ class Chronos2Pipeline(BaseChronosPipeline):
            The batch size used for prediction. Note that the batch size here means the number of time series, including target(s) and covariates,
            which are input into the model. If your data has multiple target and/or covariates, the effective number of time series tasks in a batch
            will be lower than this value, by default 256
+        context_length
+            The maximum context length used during for inference, by default set to the model's default context length
+        cross_learning
+            If True, cross-learning is enabled, i.e., all the tasks in `inputs` will be predicted jointly and the model will share information across all inputs, by default False
+            The following must be noted when using cross-learning:
+            - Cross-learning doesn't always improve forecast accuracy and must be tested for individual use cases.
+            - Results become dependent on batch size. Very large batch sizes may not provide benefits as they deviate from the maximum group size used during pretraining.
+            For optimal results, consider using a batch size around 100 (as used in the Chronos-2 technical report).
+            - Cross-learning is most helpful when individual time series have limited historical context, as the model can leverage patterns from related series in the batch.
+        validate_inputs
+            [ADVANCED] When True (default), validates dataframes before prediction. Setting to False removes the
+            validation overhead, but may silently lead to wrong predictions if data is misformatted. When False, you
+            must ensure: (1) all dataframes are sorted by (id_column, timestamp_column); (2) future_df (if provided)
+            has the same item IDs as df with exactly prediction_length rows of future timestamps per item; (3) all
+            timestamps are regularly spaced (e.g., with hourly frequency).
+        freq
+            Frequency string for timestamp generation (e.g., "h", "D", "W"). Can only be used when
+            validate_inputs=False. When provided, skips frequency inference from the data.
        **predict_kwargs
            Additional arguments passed to predict_quantiles

@ -804,6 +909,8 @@ class Chronos2Pipeline(BaseChronosPipeline):
            timestamp_column=timestamp_column,
            target_columns=target,
            prediction_length=prediction_length,
+            freq=freq,
+            validate_inputs=validate_inputs,
        )

        # Generate forecasts
@ -813,33 +920,37 @@ class Chronos2Pipeline(BaseChronosPipeline):
            quantile_levels=quantile_levels,
            limit_prediction_length=False,
            batch_size=batch_size,
+            context_length=context_length,
+            cross_learning=cross_learning,
            **predict_kwargs,
        )
        # since predict_df tasks are homogenous by input design, we can safely stack the list of tensors into a single tensor
        quantiles_np = torch.stack(quantiles).numpy()  # [n_tasks, n_variates, horizon, num_quantiles]
        mean_np = torch.stack(mean).numpy()  # [n_tasks, n_variates, horizon]

-        results_dfs = []
-        for i, (series_id, future_ts) in enumerate(prediction_timestamps.items()):
-            q_pred = quantiles_np[i]  # (n_variates, prediction_length, len(quantile_levels))
-            point_pred = mean_np[i]  # (n_variates, prediction_length)
+        n_tasks = len(prediction_timestamps)
+        n_variates = len(target)

-            for target_idx, target_col in enumerate(target):
-                series_forecast_data: dict[str | tuple[str, str], Any] = {
-                    id_column: series_id,
-                    timestamp_column: future_ts,
-                    "target_name": target_col,
-                }
-                series_forecast_data["predictions"] = point_pred[target_idx]
-                for q_idx, q_level in enumerate(quantile_levels):
-                    series_forecast_data[str(q_level)] = q_pred[target_idx, :, q_idx]
+        series_ids = list(prediction_timestamps.keys())
+        future_ts = list(prediction_timestamps.values())

-                results_dfs.append(pd.DataFrame(series_forecast_data))
+        data = {
+            id_column: np.repeat(series_ids, n_variates * prediction_length),
+            timestamp_column: np.concatenate([np.tile(ts, n_variates) for ts in future_ts]),
+            "target_name": np.tile(np.repeat(target, prediction_length), n_tasks),
+            "predictions": mean_np.ravel(),
+        }

-        predictions_df = pd.concat(results_dfs, ignore_index=True)
-        predictions_df.set_index(id_column, inplace=True)
-        predictions_df = predictions_df.loc[original_order]
-        predictions_df.reset_index(inplace=True)
+        quantiles_flat = quantiles_np.reshape(-1, len(quantile_levels))
+        for q_idx, q_level in enumerate(quantile_levels):
+            data[str(q_level)] = quantiles_flat[:, q_idx]
+
+        predictions_df = pd.DataFrame(data)
+        # If validate_inputs=False, the df is used as-is without sorting by item_id, no reordering required
+        if validate_inputs:
+            predictions_df.set_index(id_column, inplace=True)
+            predictions_df = predictions_df.loc[original_order]
+            predictions_df.reset_index(inplace=True)

        return predictions_df

@ -974,11 +1085,7 @@ class Chronos2Pipeline(BaseChronosPipeline):
            finetune_kwargs["prediction_length"] = first_window.horizon
            finetune_kwargs["batch_size"] = finetune_kwargs.get("batch_size", batch_size)

-            try:
-                pipeline = self.fit(inputs=inputs, **finetune_kwargs)
-            except Exception as e:
-                msg = f"Finetuning failed with error: {e}. Continuing with the pretrained model."
-                warnings.warn(msg, category=UserWarning, stacklevel=2)
+            pipeline = self.fit(inputs=inputs, **finetune_kwargs)

        predictions_per_window = []
        inference_time_s = 0.0
@ -995,6 +1102,86 @@ class Chronos2Pipeline(BaseChronosPipeline):

        return predictions_per_window, inference_time_s

+    @torch.no_grad()
+    def embed(
+        self, inputs: TensorOrArray | Sequence[TensorOrArray], batch_size: int = 256, context_length: int | None = None
+    ) -> tuple[list[torch.Tensor], list[tuple[torch.Tensor, torch.Tensor]]]:
+        """
+        Get encoder embeddings for the given time series.
+
+        Parameters
+        ----------
+        inputs
+            The time series to get embeddings for, can be one of:
+            - A 3-dimensional `torch.Tensor` or `np.ndarray` of shape (batch, n_variates, history_length). When `n_variates > 1`, information
+            will be shared among the different variates of each time series in the batch.
+            - A list of `torch.Tensor` or `np.ndarray` where each element can either be 1-dimensional of shape (history_length,)
+            or 2-dimensional of shape (n_variates, history_length). The history_lengths may be different across elements; left-padding
+            will be applied, if needed.
+        batch_size
+            The batch size used for generating embeddings. Note that the batch size here means the total number of time series which are input into the model.
+            If your data has multiple variates, the effective number of time series tasks in a batch will be lower than this value, by default 256
+        context_length
+            The maximum context length used during for inference, by default set to the model's default context length
+
+        Returns
+        -------
+        embeddings
+            a list of `torch.Tensor` where each element has shape (n_variates, num_patches + 2, d_model) and the number of elements are equal to the number
+            of target time series (univariate or multivariate) in the `inputs`. The extra +2 is due to embeddings of the [REG] token and a masked output patch token.
+        loc_scale
+            a list of tuples with the mean and standard deviation of each time series.
+        """
+        if context_length is None:
+            context_length = self.model_context_length
+
+        if context_length > self.model_context_length:
+            warnings.warn(
+                f"The specified context_length {context_length} is greater than the model's default context length {self.model_context_length}. "
+                f"Resetting context_length to {self.model_context_length}."
+            )
+            context_length = self.model_context_length
+
+        test_dataset = Chronos2Dataset(
+            inputs,
+            context_length=context_length,
+            prediction_length=0,
+            batch_size=batch_size,
+            output_patch_size=self.model_output_patch_size,
+            mode=DatasetMode.TEST,
+        )
+        test_loader = DataLoader(
+            test_dataset,
+            batch_size=None,
+            num_workers=0,
+            pin_memory=self.model.device.type == "cuda",
+            shuffle=False,
+            drop_last=False,
+        )
+        all_embeds: list[torch.Tensor] = []
+        all_loc_scales: list[tuple[torch.Tensor, torch.Tensor]] = []
+        for batch in test_loader:
+            assert batch["future_target"] is None
+            batch_context = batch["context"]
+            batch_group_ids = batch["group_ids"]
+            batch_target_idx_ranges = batch["target_idx_ranges"]
+
+            encoder_outputs, (locs, scales), *_ = self.model.encode(
+                context=batch_context.to(device=self.model.device, dtype=torch.float32),
+                group_ids=batch_group_ids.to(self.model.device),
+            )
+            batch_embeds = [encoder_outputs[0][start:end].cpu() for (start, end) in batch_target_idx_ranges]
+            batch_loc_scales = list(
+                zip(
+                    [locs[start:end].cpu() for (start, end) in batch_target_idx_ranges],
+                    [scales[start:end].cpu() for (start, end) in batch_target_idx_ranges],
+                )
+            )
+            all_embeds.extend(batch_embeds)
+            all_loc_scales.extend(batch_loc_scales)
+
+        return all_embeds, all_loc_scales
+
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, *args, **kwargs):
        """
@ -1002,9 +1189,25 @@ class Chronos2Pipeline(BaseChronosPipeline):
        Supports the same arguments as ``AutoConfig`` and ``AutoModel`` from ``transformers``.
        """

+        # Check if the model is on S3 and cache it locally first
+        # NOTE: Only base models (not LoRA adapters) are supported via S3
        if str(pretrained_model_name_or_path).startswith("s3://"):
            return BaseChronosPipeline.from_pretrained(pretrained_model_name_or_path, *args, **kwargs)

+        # Check if the hub model_id or local path is a LoRA adapter
+        if find_adapter_config_file(pretrained_model_name_or_path) is not None:
+            if not is_peft_available():
+                raise ImportError(
+                    f"The model at {pretrained_model_name_or_path} is a `peft` adaptor, but `peft` is not available. "
+                    f"Please install `peft` with `pip install peft` to use this model. "
+                )
+            from peft import AutoPeftModel
+
+            model = AutoPeftModel.from_pretrained(pretrained_model_name_or_path, *args, **kwargs)
+            model = model.merge_and_unload()
+            return cls(model=model)
+
+        # Handle the case for the base model
        config = AutoConfig.from_pretrained(pretrained_model_name_or_path, *args, **kwargs)
        assert hasattr(config, "chronos_config"), "Not a Chronos config file"

--- a/src/chronos/chronos2/trainer.py
+++ b/src/chronos/chronos2/trainer.py
@ -3,6 +3,7 @@

 # Authors: Abdul Fatir Ansari <ansarnd@amazon.com>

+import warnings
 from typing import TYPE_CHECKING, cast

 from torch.utils.data import DataLoader, Dataset
@ -48,11 +49,16 @@ class Chronos2Trainer(Trainer):

        train_dataset = cast("Chronos2Dataset", self.train_dataset)

-        assert train_dataset.batch_size == self.args.train_batch_size, (
-            f"The batch_size of the train_dataset ({train_dataset.batch_size}) does not match the batch_size  "
-            f"in TrainingArguments ({self.args.train_batch_size}). If you're using a machine with multiple GPUs, "
-            f"ensure that only a single GPU is visible by setting the CUDA_VISIBLE_DEVICES environment variable."
-        )
+        if self.args.train_batch_size > train_dataset.batch_size:
+            warnings.warn(
+                f"The batch_size of the train_dataset ({train_dataset.batch_size}) does not match the batch_size "
+                f"in TrainingArguments ({self.args.train_batch_size}). On machines with multiple GPUs, this may indicate "
+                f"that multiple GPUs are visible and transformers is using DataParallel for training by default. "
+                f"This may lead to unnecessary slowdown and unexpected behavior. We strongly recommend setting the CUDA_VISIBLE_DEVICES "
+                f"environment variable to ensure that only a single GPU is visible.",
+                category=UserWarning,
+                stacklevel=3,
+            )

        dataloader_params = {
            # Disable automatic batching as we handle batching ourselves
@ -74,11 +80,16 @@ class Chronos2Trainer(Trainer):

        eval_dataset = cast("Chronos2Dataset", self.eval_dataset)

-        assert eval_dataset.batch_size == self.args.eval_batch_size, (
-            f"The batch_size of the eval_dataset ({eval_dataset.batch_size}) does not match the batch_size  "
-            f"in TrainingArguments ({self.args.eval_batch_size}). If you're using a machine with multiple GPUs, "
-            f"ensure that only a single GPU is visible by setting the CUDA_VISIBLE_DEVICES environment variable."
-        )
+        if self.args.eval_batch_size > eval_dataset.batch_size:
+            warnings.warn(
+                f"The batch_size of the eval_dataset ({eval_dataset.batch_size}) does not match the batch_size "
+                f"in TrainingArguments ({self.args.eval_batch_size}). On machines with multiple GPUs, this may indicate "
+                f"that multiple GPUs are visible and transformers is using DataParallel for training by default. "
+                f"This may lead to unnecessary slowdown and unexpected behavior. We strongly recommend setting the CUDA_VISIBLE_DEVICES "
+                f"environment variable to ensure that only a single GPU is visible.",
+                category=UserWarning,
+                stacklevel=3,
+            )

        dataloader_params = {
            # Disable automatic batching as we handle batching ourselves
--- a/src/chronos/chronos_bolt.py
+++ b/src/chronos/chronos_bolt.py
@ -408,6 +408,14 @@ class ChronosBoltPipeline(BaseChronosPipeline):
        super().__init__(inner_model=model)  # type: ignore
        self.model = model

+    @property
+    def model_context_length(self) -> int:
+        return self.model.chronos_config.context_length
+
+    @property
+    def model_prediction_length(self) -> int:
+        return self.model.chronos_config.prediction_length
+
    @property
    def quantiles(self) -> List[float]:
        return self.model.config.chronos_config["quantiles"]
@ -487,14 +495,12 @@ class ChronosBoltPipeline(BaseChronosPipeline):
        """
        context_tensor = self._prepare_and_validate_context(context=inputs)

-        model_context_length: int = self.model.config.chronos_config["context_length"]
-        model_prediction_length: int = self.model.config.chronos_config["prediction_length"]
        if prediction_length is None:
-            prediction_length = model_prediction_length
+            prediction_length = self.model_prediction_length

-        if prediction_length > model_prediction_length:
+        if prediction_length > self.model_prediction_length:
            msg = (
-                f"We recommend keeping prediction length <= {model_prediction_length}. "
+                f"We recommend keeping prediction length <= {self.model_prediction_length}. "
                "The quality of longer predictions may degrade since the model is not optimized for it. "
            )
            if limit_prediction_length:
@ -507,32 +513,46 @@ class ChronosBoltPipeline(BaseChronosPipeline):

        # We truncate the context here because otherwise batches with very long
        # context could take up large amounts of GPU memory unnecessarily.
-        if context_tensor.shape[-1] > model_context_length:
-            context_tensor = context_tensor[..., -model_context_length:]
+        if context_tensor.shape[-1] > self.model_context_length:
+            context_tensor = context_tensor[..., -self.model_context_length :]

-        # TODO: We unroll the forecast of Chronos Bolt greedily with the full forecast
-        # horizon that the model was trained with (i.e., 64). This results in variance collapsing
-        # every 64 steps.
-        context_tensor = context_tensor.to(
-            device=self.model.device,
-            dtype=torch.float32,
-        )
-        while remaining > 0:
-            with torch.no_grad():
-                prediction = self.model(
-                    context=context_tensor,
-                ).quantile_preds.to(context_tensor)
+        context_tensor = context_tensor.to(device=self.model.device, dtype=torch.float32)
+        # First block prediction
+        with torch.no_grad():
+            prediction: torch.Tensor = self.model(context=context_tensor).quantile_preds.to(context_tensor)

            predictions.append(prediction)
            remaining -= prediction.shape[-1]

-            if remaining <= 0:
-                break
+        # NOTE: The following heuristic for better prediction intervals with long-horizon forecasts
+        # uses all quantiles generated by the model for the first `model_prediction_length` steps,
+        # concatenating each quantile with the context and generating the next `model_prediction_length` steps.
+        # The `num_quantiles * num_quantiles` "samples" thus generated are then reduced to `num_quantiles`
+        # by computing empirical quantiles. Note that this option scales the batch size by `num_quantiles`
+        # when the `prediction_length` is greater than `model_prediction_length`.

-            central_idx = torch.abs(torch.tensor(self.quantiles) - 0.5).argmin()
-            central_prediction = prediction[:, central_idx]
+        if remaining > 0:
+            # Expand the context along quantile axis
+            context_tensor = context_tensor.unsqueeze(1).repeat(1, len(self.quantiles), 1)

-            context_tensor = torch.cat([context_tensor, central_prediction], dim=-1)
+        quantile_tensor = torch.tensor(self.quantiles, device=context_tensor.device)
+        while remaining > 0:
+            # Append the prediction to context
+            context_tensor = torch.cat([context_tensor, prediction], dim=-1)[..., -self.model_context_length :]
+            (batch_size, n_quantiles, context_length) = context_tensor.shape
+
+            with torch.no_grad():
+                # Reshape (batch, n_quantiles, context_length) -> (batch * n_quantiles, context_length)
+                prediction = self.model(
+                    context=context_tensor.reshape(batch_size * n_quantiles, context_length)
+                ).quantile_preds.to(context_tensor)
+            # Reshape predictions from (batch * n_quantiles, n_quantiles, model_prediction_length) to (batch, n_quantiles * n_quantiles, model_prediction_length)
+            prediction = prediction.reshape(batch_size, n_quantiles * n_quantiles, -1)
+            # Reduce `n_quantiles * n_quantiles` to n_quantiles and transpose back to (batch_size, n_quantiles, model_prediction_length)
+            prediction = torch.quantile(prediction, q=quantile_tensor, dim=1).transpose(0, 1)
+
+            predictions.append(prediction)
+            remaining -= prediction.shape[-1]

        return torch.cat(predictions, dim=-1)[..., :prediction_length].to(dtype=torch.float32, device="cpu")

--- a/src/chronos/df_utils.py
+++ b/src/chronos/df_utils.py
@ -0,0 +1,341 @@
+# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+# Authors: Abdul Fatir Ansari <ansarnd@amazon.com>
+
+
+import warnings
+from typing import TYPE_CHECKING
+
+import numpy as np
+
+if TYPE_CHECKING:
+    import pandas as pd
+
+
+def _validate_df_types_and_cast(
+    df: "pd.DataFrame",
+    future_df: "pd.DataFrame | None",
+    target_columns: list[str],
+    id_column: str = "item_id",
+    timestamp_column: str = "timestamp",
+) -> tuple["pd.DataFrame", "pd.DataFrame | None"]:
+    import pandas as pd
+
+    astype_dict = {}
+    future_astype_dict = {}
+    for col in df.columns.drop([id_column, timestamp_column]):
+        col_dtype = df[col].dtype
+        if col in target_columns and not pd.api.types.is_numeric_dtype(df[col]):
+            raise ValueError(f"All target columns must be numeric but got {col=} with dtype={col_dtype}")
+
+        if (
+            pd.api.types.is_object_dtype(df[col])
+            or pd.api.types.is_string_dtype(df[col])
+            or isinstance(col_dtype, pd.CategoricalDtype)
+        ):
+            astype_dict[col] = "category"
+        elif pd.api.types.is_numeric_dtype(df[col]) or pd.api.types.is_bool_dtype(df[col]):
+            astype_dict[col] = "float32"
+        else:
+            raise ValueError(
+                f"All columns must contain numeric, object, category, string, or bool dtype but got {col=} with dtype={col_dtype}"
+            )
+
+        if future_df is not None and col in future_df.columns:
+            if future_df[col].dtype != col_dtype:
+                raise ValueError(
+                    f"Column {col} in future_df has dtype {future_df[col].dtype} but column in df has dtype {col_dtype}"
+                )
+            future_astype_dict[col] = astype_dict[col]
+
+    df = df.astype(astype_dict, copy=True)
+    if future_df is not None:
+        future_df = future_df.astype(future_astype_dict, copy=True)
+
+    return df, future_df
+
+
+def validate_df_inputs(
+    df: "pd.DataFrame",
+    future_df: "pd.DataFrame | None",
+    target_columns: list[str],
+    prediction_length: int,
+    id_column: str = "item_id",
+    timestamp_column: str = "timestamp",
+) -> tuple["pd.DataFrame", "pd.DataFrame | None", str, list[int], np.ndarray]:
+    """
+    Validates and prepares dataframe inputs
+
+    Parameters
+    ----------
+    df
+        Input dataframe containing time series data with columns:
+        - id_column: Identifier for each time series
+        - timestamp_column: Timestamps for each observation
+        - target_columns: One or more target variables to forecast
+        - Additional columns are treated as covariates
+    future_df
+        Optional dataframe containing future covariate values with columns:
+        - id_column: Identifier for each time series
+        - timestamp_column: Future timestamps
+        - Subset of covariate columns from df
+    target_columns
+        Names of target columns to forecast
+    prediction_length
+        Number of future time steps to predict
+    id_column
+        Name of column containing time series identifiers
+    timestamp_column
+        Name of column containing timestamps
+
+    Returns
+    -------
+    A tuple containing:
+    - Validated and sorted input dataframe
+    - Validated and sorted future dataframe (if provided)
+    - Inferred frequency of the time series
+    - List of series lengths from input dataframe
+    - Original order of time series IDs
+
+    Raises
+    ------
+    ValueError
+        If validation fails for:
+        - Missing required columns
+        - Invalid data types
+        - Inconsistent frequencies
+        - Insufficient data points
+        - Mismatched series between df and future_df
+        - Invalid future_df lengths
+    """
+
+    import pandas as pd
+
+    required_cols = [id_column, timestamp_column] + target_columns
+    missing_cols = [col for col in required_cols if col not in df.columns]
+    if missing_cols:
+        raise ValueError(f"df does not contain all expected columns. Missing columns: {missing_cols}")
+
+    if future_df is not None:
+        future_required_cols = [id_column, timestamp_column]
+        missing_future_cols = [col for col in future_required_cols if col not in future_df.columns]
+        targets_in_future = [col for col in future_df.columns if col in target_columns]
+        extra_future_cols = [col for col in future_df.columns if col not in df.columns]
+        if missing_future_cols:
+            raise ValueError(
+                f"future_df does not contain all expected columns. Missing columns: {missing_future_cols}"
+            )
+        if targets_in_future:
+            raise ValueError(
+                f"future_df cannot contain target columns. Target columns found in future_df: {targets_in_future}"
+            )
+        if extra_future_cols:
+            raise ValueError(f"future_df cannot contain columns not present in df. Extra columns: {extra_future_cols}")
+
+    df, future_df = _validate_df_types_and_cast(
+        df, future_df, id_column=id_column, timestamp_column=timestamp_column, target_columns=target_columns
+    )
+
+    # Get the original order of time series IDs
+    original_order = df[id_column].unique()
+
+    # Sort and prepare df
+    df[timestamp_column] = pd.to_datetime(df[timestamp_column])
+    df = df.sort_values([id_column, timestamp_column])
+
+    # Get series lengths
+    series_lengths = df[id_column].value_counts(sort=False).to_list()
+
+    def validate_freq(timestamps: pd.DatetimeIndex, series_id: str):
+        freq = pd.infer_freq(timestamps)
+        if not freq:
+            raise ValueError(f"Could not infer frequency for series {series_id}")
+        return freq
+
+    # Validate each series
+    all_freqs = []
+    start_idx = 0
+    timestamp_index = pd.DatetimeIndex(df[timestamp_column])
+    for length in series_lengths:
+        if length < 3:
+            series_id = df[id_column].iloc[start_idx]
+            raise ValueError(
+                f"Every time series must have at least 3 data points, found {length=} for series {series_id}"
+            )
+        timestamps = timestamp_index[start_idx : start_idx + length]
+        series_id = df[id_column].iloc[start_idx]
+        all_freqs.append(validate_freq(timestamps, series_id))
+        start_idx += length
+
+    if len(set(all_freqs)) > 1:
+        raise ValueError("All time series must have the same frequency")
+
+    inferred_freq = all_freqs[0]
+
+    # Sort future_df if provided and validate its series lengths
+    future_series_lengths = None
+    if future_df is not None:
+        future_df[timestamp_column] = pd.to_datetime(future_df[timestamp_column])
+        future_df = future_df.sort_values([id_column, timestamp_column])
+
+        # Validate that future_df contains all series from df
+        context_ids = set(df[id_column].unique())
+        future_ids = set(future_df[id_column].unique())
+        if context_ids != future_ids:
+            raise ValueError("future_df must contain the same time series IDs as df")
+
+        future_series_lengths = future_df[id_column].value_counts(sort=False)
+        if (future_series_lengths != prediction_length).any():
+            invalid_series = future_series_lengths[future_series_lengths != prediction_length]
+            raise ValueError(
+                f"future_df must contain {prediction_length=} values for each series, "
+                f"but found series with different lengths: {invalid_series.to_dict()}"
+            )
+
+    return df, future_df, inferred_freq, series_lengths, original_order
+
+
+def convert_df_input_to_list_of_dicts_input(
+    df: "pd.DataFrame",
+    future_df: "pd.DataFrame | None",
+    target_columns: list[str],
+    prediction_length: int,
+    id_column: str = "item_id",
+    timestamp_column: str = "timestamp",
+    validate_inputs: bool = True,
+    freq: str | None = None,
+) -> tuple[list[dict[str, np.ndarray | dict[str, np.ndarray]]], np.ndarray, dict[str, "pd.DatetimeIndex"]]:
+    """
+    Convert from dataframe input format to a list of dictionaries input format.
+
+    Parameters
+    ----------
+    df
+        Input dataframe containing time series data with columns:
+        - id_column: Identifier for each time series
+        - timestamp_column: Timestamps for each observation
+        - target_columns: One or more target variables to forecast
+        - Additional columns are treated as covariates
+    future_df
+        Optional dataframe containing future covariate values with columns:
+        - id_column: Identifier for each time series
+        - timestamp_column: Future timestamps
+        - Subset of covariate columns from df
+    target_columns
+        Names of target columns to forecast
+    prediction_length
+        Number of future time steps to predict
+    id_column
+        Name of column containing time series identifiers
+    timestamp_column
+        Name of column containing timestamps
+    validate_inputs
+        [ADVANCED] When True (default), validates dataframes before prediction. Setting to False removes the
+        validation overhead, but may silently lead to wrong predictions if data is misformatted. When False, you
+        must ensure: (1) all dataframes are sorted by (id_column, timestamp_column); (2) future_df (if provided)
+        has the same item IDs as df with exactly prediction_length rows of future timestamps per item; (3) all
+        timestamps are regularly spaced (e.g., with hourly frequency).
+    freq
+        Frequency string for timestamp generation (e.g., "h", "D", "W"). Can only be used
+        when validate_inputs=False. When provided, skips frequency inference from the data.
+
+    Returns
+    -------
+    A tuple containing:
+    - Time series converted to list of dictionaries format
+    - Original order of time series IDs
+    - Dictionary mapping series IDs to future time index
+    """
+
+    import pandas as pd
+
+    if freq is not None and validate_inputs:
+        raise ValueError(
+            "freq can only be provided when validate_inputs=False. "
+            "When using freq with validate_inputs=False, you must ensure: "
+            "(1) all dataframes are sorted by (id_column, timestamp_column);  "
+            "(2) future_df (if provided) has the same item IDs as df with exactly "
+            "prediction_length rows of future timestamps per item; "
+            "(3) all timestamps are regularly spaced."
+        )
+
+    if validate_inputs:
+        df, future_df, freq, series_lengths, original_order = validate_df_inputs(
+            df,
+            future_df=future_df,
+            id_column=id_column,
+            timestamp_column=timestamp_column,
+            target_columns=target_columns,
+            prediction_length=prediction_length,
+        )
+    else:
+        # Get the original order of time series IDs
+        original_order = df[id_column].unique()
+
+        # Get series lengths
+        series_lengths = df[id_column].value_counts(sort=False).to_list()
+
+        # If freq is not provided, infer from the first series with >= 3 points
+        if freq is None:
+            timestamp_index = pd.DatetimeIndex(df[timestamp_column])
+            start_idx = 0
+            for length in series_lengths:
+                if length < 3:
+                    start_idx += length
+                    continue
+                timestamps = timestamp_index[start_idx : start_idx + length]
+                freq = pd.infer_freq(timestamps)
+                break
+
+            assert freq is not None, "validate_inputs is False, but could not infer frequency from the dataframe"
+
+    # Convert to list of dicts format
+    inputs: list[dict[str, np.ndarray | dict[str, np.ndarray]]] = []
+    prediction_timestamps: dict[str, pd.DatetimeIndex] = {}
+
+    indptr = np.concatenate([[0], np.cumsum(series_lengths)]).astype("int64")
+    target_array = df[target_columns].to_numpy().T  # Shape: (n_targets, len(df))
+    last_ts = pd.DatetimeIndex(df[timestamp_column].iloc[indptr[1:] - 1])  # Shape: (n_series,)
+    offset = pd.tseries.frequencies.to_offset(freq)
+    with warnings.catch_warnings():
+        # Silence PerformanceWarning for non-vectorized offsets https://github.com/pandas-dev/pandas/blob/95624ca2e99b0/pandas/core/arrays/datetimes.py#L822
+        warnings.simplefilter("ignore", category=pd.errors.PerformanceWarning)
+        # Generate all prediction timestamps at once by stacking offsets into shape (n_series * prediction_length)
+        prediction_timestamps_array = pd.DatetimeIndex(
+            np.dstack([last_ts + step * offset for step in range(1, prediction_length + 1)]).ravel()
+        )
+
+    past_covariates_dict = {
+        col: df[col].to_numpy() for col in df.columns if col not in [id_column, timestamp_column] + target_columns
+    }
+    future_covariates_dict = {}
+    if future_df is not None:
+        for col in future_df.columns.drop([id_column, timestamp_column]):
+            future_covariates_dict[col] = future_df[col].to_numpy()
+        if validate_inputs:
+            if (pd.DatetimeIndex(future_df[timestamp_column]) != pd.DatetimeIndex(prediction_timestamps_array)).any():
+                raise ValueError(
+                    "future_df timestamps do not match the expected prediction timestamps. "
+                    "You can disable this check by setting `validate_inputs=False`"
+                )
+
+    for i in range(len(series_lengths)):
+        start_idx, end_idx = indptr[i], indptr[i + 1]
+        future_start_idx, future_end_idx = i * prediction_length, (i + 1) * prediction_length
+
+        series_id = df[id_column].iloc[start_idx]
+        prediction_timestamps[series_id] = prediction_timestamps_array[future_start_idx:future_end_idx]
+        task: dict[str, np.ndarray | dict[str, np.ndarray]] = {"target": target_array[:, start_idx:end_idx]}
+
+        if len(past_covariates_dict) > 0:
+            task["past_covariates"] = {col: values[start_idx:end_idx] for col, values in past_covariates_dict.items()}
+            if len(future_covariates_dict) > 0:
+                task["future_covariates"] = {
+                    col: values[future_start_idx:future_end_idx] for col, values in future_covariates_dict.items()
+                }
+        inputs.append(task)
+
+    assert len(inputs) == len(series_lengths)
+
+    return inputs, original_order, prediction_timestamps
--- a/test/dummy-chronos2-lora/adapter_config.json
+++ b/test/dummy-chronos2-lora/adapter_config.json
@ -0,0 +1,35 @@
+{
+  "alpha_pattern": {},
+  "auto_mapping": {
+    "base_model_class": "Chronos2Model",
+    "parent_library": "chronos.chronos2.model"
+  },
+  "base_model_name_or_path": "test/dummy-chronos2-model",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.0,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "self_attention.q",
+    "self_attention.k",
+    "self_attention.o",
+    "output_patch_embedding.output_layer",
+    "self_attention.v"
+  ],
+  "task_type": null,
+  "use_dora": false,
+  "use_rslora": false
+}
--- a/test/dummy-chronos2-lora/adapter_model.safetensors
+++ b/test/dummy-chronos2-lora/adapter_model.safetensors
--- a/test/test_chronos.py
+++ b/test/test_chronos.py
@ -3,6 +3,8 @@

 from pathlib import Path

+import numpy as np
+import pandas as pd
 import pytest
 import torch

@ -12,7 +14,14 @@ from chronos import (
    ChronosPipeline,
    MeanScaleUniformBins,
 )
-from test.util import validate_tensor
+from test.util import create_df, get_forecast_start_times, validate_tensor
+
+DUMMY_MODEL_PATH = Path(__file__).parent / "dummy-chronos-model"
+
+
+@pytest.fixture
+def pipeline() -> ChronosPipeline:
+    return BaseChronosPipeline.from_pretrained(DUMMY_MODEL_PATH, device_map="cpu")


 def test_base_chronos_pipeline_loads_from_huggingface():
@ -167,11 +176,7 @@ def test_tokenizer_random_data(use_eos_token: bool):
@pytest.mark.parametrize("model_dtype", [torch.float32, torch.bfloat16])
@pytest.mark.parametrize("input_dtype", [torch.float32, torch.bfloat16, torch.int64])
 def test_pipeline_predict(model_dtype: torch.dtype, input_dtype: torch.dtype):
-    pipeline = ChronosPipeline.from_pretrained(
-        Path(__file__).parent / "dummy-chronos-model",
-        device_map="cpu",
-        torch_dtype=model_dtype,
-    )
+    pipeline = ChronosPipeline.from_pretrained(DUMMY_MODEL_PATH, device_map="cpu", torch_dtype=model_dtype)
    context = 10 * torch.rand(size=(4, 16)) + 10
    context = context.to(dtype=input_dtype)

@ -238,11 +243,7 @@ def test_pipeline_predict_quantiles(
    prediction_length: int,
    quantile_levels: list[int],
 ):
-    pipeline = ChronosPipeline.from_pretrained(
-        Path(__file__).parent / "dummy-chronos-model",
-        device_map="cpu",
-        torch_dtype=model_dtype,
-    )
+    pipeline = ChronosPipeline.from_pretrained(DUMMY_MODEL_PATH, device_map="cpu", torch_dtype=model_dtype)
    context = 10 * torch.rand(size=(4, 16)) + 10
    context = context.to(dtype=input_dtype)

@ -284,11 +285,7 @@ def test_pipeline_predict_quantiles(
@pytest.mark.parametrize("model_dtype", [torch.float32, torch.bfloat16])
@pytest.mark.parametrize("input_dtype", [torch.float32, torch.bfloat16, torch.int64])
 def test_pipeline_embed(model_dtype: torch.dtype, input_dtype: torch.dtype):
-    pipeline = ChronosPipeline.from_pretrained(
-        Path(__file__).parent / "dummy-chronos-model",
-        device_map="cpu",
-        torch_dtype=model_dtype,
-    )
+    pipeline = ChronosPipeline.from_pretrained(DUMMY_MODEL_PATH, device_map="cpu", torch_dtype=model_dtype)
    d_model = pipeline.model.model.config.d_model
    context = 10 * torch.rand(size=(4, 16)) + 10
    context = context.to(dtype=input_dtype)
@ -312,6 +309,88 @@ def test_pipeline_embed(model_dtype: torch.dtype, input_dtype: torch.dtype):
    validate_tensor(scale, shape=(1,), dtype=torch.float32)


+@pytest.mark.parametrize(
+    "context_setup, expected_rows",
+    [
+        # Targets only
+        ({}, 6),  # 2 series * 3 predictions
+        # Different context lengths
+        (
+            {"series_ids": ["X", "Y", "Z"], "n_points": [10, 17, 56], "target_cols": ["custom_target"]},
+            9,
+        ),  # 3 series * 3 predictions
+    ],
+)
+@pytest.mark.parametrize("freq", ["s", "min", "30min", "h", "D", "W", "ME", "QE", "YE"])
+def test_predict_df_works_for_valid_inputs(pipeline, context_setup, expected_rows, freq):
+    prediction_length = 3
+    df = create_df(**context_setup, freq=freq)
+    forecast_start_times = get_forecast_start_times(df, freq)
+
+    series_ids = context_setup.get("series_ids", ["A", "B"])
+    target_columns = context_setup.get("target_cols", ["target"])
+    n_series = len(series_ids)
+    n_targets = len(target_columns)
+    result = pipeline.predict_df(df, target=target_columns[0], prediction_length=prediction_length)
+
+    assert len(result) == expected_rows
+    assert "item_id" in result.columns and np.all(
+        result["item_id"].to_numpy() == np.array(series_ids).repeat(n_targets * prediction_length)
+    )
+    assert "target_name" in result.columns and np.all(
+        result["target_name"].to_numpy() == np.tile(np.array(target_columns).repeat(prediction_length), n_series)
+    )
+    assert "timestamp" in result.columns and np.all(
+        result.groupby("item_id")["timestamp"].min().to_numpy() == pd.to_datetime(forecast_start_times).to_numpy()
+    )
+    assert "predictions" in result.columns
+    assert all(str(q) in result.columns for q in [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
+
+
+def test_predict_df_with_non_uniform_timestamps_raises_error(pipeline):
+    df = create_df()
+    # Make timestamps non-uniform for series A
+    df.loc[df["item_id"] == "A", "timestamp"] = [
+        "2023-01-01",
+        "2023-01-02",
+        "2023-01-04",
+        "2023-01-05",
+        "2023-01-06",
+        "2023-01-07",
+        "2023-01-08",
+        "2023-01-09",
+        "2023-01-10",
+        "2023-01-11",
+    ]
+
+    with pytest.raises(ValueError, match="not infer frequency"):
+        pipeline.predict_df(df)
+
+
+def test_predict_df_with_inconsistent_frequencies_raises_error(pipeline):
+    df = pd.DataFrame(
+        {
+            "item_id": ["A", "A", "A", "A", "A", "B", "B", "B", "B", "B"],
+            "timestamp": [
+                "2023-01-01",
+                "2023-01-02",
+                "2023-01-03",
+                "2023-01-04",
+                "2023-01-05",
+                "2023-01-01",
+                "2023-02-01",
+                "2023-03-01",
+                "2023-04-01",
+                "2023-05-01",
+            ],
+            "target": [1.0] * 10,
+        }
+    )
+
+    with pytest.raises(ValueError, match="same frequency"):
+        pipeline.predict_df(df)
+
+
@pytest.mark.parametrize("n_tokens", [10, 1000, 10000])
 def test_tokenizer_number_of_buckets(n_tokens):
    config = ChronosConfig(
--- a/test/test_chronos2.py
+++ b/test/test_chronos2.py
@ -13,8 +13,10 @@ import pytest
 import torch

 from chronos import BaseChronosPipeline, Chronos2Pipeline
-from chronos.chronos2.dataset import convert_df_input_to_list_of_dicts_input
-from test.util import validate_tensor
+from chronos.chronos2.config import Chronos2CoreConfig
+from chronos.chronos2.layers import MHA
+from chronos.df_utils import convert_df_input_to_list_of_dicts_input
+from test.util import create_df, create_future_df, get_forecast_start_times, validate_tensor, timeout_callback

 DUMMY_MODEL_PATH = Path(__file__).parent / "dummy-chronos2-model"

@ -32,6 +34,14 @@ def test_base_chronos2_pipeline_loads_from_s3():
    BaseChronosPipeline.from_pretrained("s3://autogluon/chronos-2", device_map="cpu")


+def test_base_chronos2_pipeline_loads_from_hf():
+    BaseChronosPipeline.from_pretrained("amazon/chronos-2", device_map="cpu")
+
+
+def test_chronos2_lora_pipeline_loads_from_disk():
+    Chronos2Pipeline.from_pretrained(Path(__file__).parent / "dummy-chronos2-lora", device_map="cpu")
+
+
@pytest.mark.parametrize(
    "inputs, prediction_length, expected_output_shapes",
    [
@ -317,13 +327,11 @@ def test_when_input_is_invalid_then_predict_raises_value_error(pipeline, inputs,
        _ = pipeline.predict(inputs, prediction_length=10)


-@pytest.mark.parametrize("torch_dtype", [torch.float32, torch.bfloat16])
+@pytest.mark.parametrize("dtype", [torch.float32, torch.bfloat16])
@pytest.mark.parametrize("input_dtype", [torch.float32, torch.bfloat16, torch.int64])
-def test_pipeline_predict_can_handle_different_model_and_input_dtypes(
-    torch_dtype: torch.dtype, input_dtype: torch.dtype
-):
+def test_pipeline_predict_can_handle_different_model_and_input_dtypes(dtype: torch.dtype, input_dtype: torch.dtype):
    pipeline = BaseChronosPipeline.from_pretrained(
-        Path(__file__).parent / "dummy-chronos2-model", device_map="cpu", torch_dtype=torch_dtype
+        Path(__file__).parent / "dummy-chronos2-model", device_map="cpu", torch_dtype=dtype
    )
    context = 10 * torch.rand(size=(4, 3, 16)) + 10
    context = context.to(dtype=input_dtype)
@ -336,6 +344,35 @@ def test_pipeline_predict_can_handle_different_model_and_input_dtypes(
        validate_tensor(quantiles_item, (3, expected_num_quantiles, 7), dtype=torch.float32)


+@pytest.mark.parametrize(
+    "inputs, expected_output_shapes",
+    [
+        # NOTE: d_model for the dummy model is 6
+        # Homogenous univariate task
+        (torch.rand(4, 1, 16), [(1, 3, 6)] * 4),
+        # Homogenous multivariate task
+        (torch.rand(4, 3, 37), [(3, 5, 6)] * 4),
+        # Heterogenous tasks with different history lengths
+        (
+            [torch.rand(100), torch.rand(2, 150), torch.rand(120)],
+            [(1, 12, 6), (2, 12, 6), (1, 12, 6)],
+        ),
+    ],
+)
+def test_when_input_is_valid_then_pipeline_can_embed(pipeline, inputs, expected_output_shapes):
+    embeds, loc_scales = pipeline.embed(inputs)
+
+    assert (
+        isinstance(embeds, list)
+        and len(embeds) == len(expected_output_shapes)
+        and len(loc_scales) == len(expected_output_shapes)
+    )
+    for embed, loc_scale, expected_shape in zip(embeds, loc_scales, expected_output_shapes):
+        validate_tensor(embed, expected_shape, dtype=torch.float32)
+        validate_tensor(loc_scale[0], (expected_shape[0], 1), dtype=torch.float32)
+        validate_tensor(loc_scale[1], (expected_shape[0], 1), dtype=torch.float32)
+
+
@pytest.mark.parametrize(
    "task_kwargs",
    [
@ -383,81 +420,54 @@ def test_pipeline_can_evaluate_on_dummy_fev_task(pipeline, task_kwargs):
    assert isinstance(eval_summary["test_error"], float)


-def create_df(series_ids=["A", "B"], n_points=[10, 10], target_cols=["target"], covariates=None, freq="h"):
-    """Helper to create test context DataFrames."""
-    series_dfs = []
-    for series_id, length in zip(series_ids, n_points):
-        series_data = {"item_id": series_id, "timestamp": pd.date_range(end="2001-10-01", periods=length, freq=freq)}
-        for target_col in target_cols:
-            series_data[target_col] = np.random.randn(length)
-        if covariates:
-            for cov in covariates:
-                series_data[cov] = np.random.randn(length)
-        series_dfs.append(pd.DataFrame(series_data))
-    return pd.concat(series_dfs, ignore_index=True)
-
-
-def create_future_df(forecast_start_times: list, series_ids=["A", "B"], n_points=[5, 5], covariates=None, freq="h"):
-    """Helper to create test future DataFrames."""
-    series_dfs = []
-    for series_id, length, start in zip(series_ids, n_points, forecast_start_times):
-        series_data = {"item_id": series_id, "timestamp": pd.date_range(start=start, periods=length, freq=freq)}
-        if covariates:
-            for cov in covariates:
-                series_data[cov] = np.random.randn(length)
-        series_dfs.append(pd.DataFrame(series_data))
-    return pd.concat(series_dfs, ignore_index=True)
-
-
-def get_forecast_start_times(df, freq="h"):
-    context_end_times = df.groupby("item_id")["timestamp"].max()
-    forecast_start_times = [pd.date_range(end_time, periods=2, freq=freq)[-1] for end_time in context_end_times]
-
-    return forecast_start_times
-
-
@pytest.mark.parametrize(
-    "context_setup, future_setup, expected_rows",
+    "context_setup, future_setup",
    [
        # Targets only
-        ({}, None, 6),  # 2 series * 3 predictions
+        ({}, None),
        # Multiple targets with different context lengths
-        (
-            {"target_cols": ["sales", "revenue", "profit"], "n_points": [10, 17]},
-            None,
-            18,
-        ),  # 2 series * 3 targets * 3 predictions
+        ({"target_cols": ["sales", "revenue", "profit"], "n_points": [10, 17]}, None),
        # With past covariates
-        ({"covariates": ["cov1"]}, None, 6),
+        ({"covariates": ["cov1"]}, None),
        # With future covariates
-        ({"covariates": ["cov1"]}, {"covariates": ["cov1"], "n_points": [3, 3]}, 6),
+        ({"covariates": ["cov1"]}, {"covariates": ["cov1"]}),
        # With past-only and future covariates
-        ({"covariates": ["cov1", "cov2"]}, {"covariates": ["cov1"], "n_points": [3, 3]}, 6),
+        ({"covariates": ["cov1", "cov2"]}, {"covariates": ["cov1"]}),
        # With past-only and future covariates and different series order
        (
            {"series_ids": ["B", "C", "A", "Z"], "n_points": [10, 20, 100, 256], "covariates": ["cov1", "cov2"]},
-            {
-                "series_ids": ["B", "C", "A", "Z"],
-                "covariates": ["cov1"],
-                "n_points": [3, 3, 3, 3],
-            },
-            12,
+            {"series_ids": ["B", "C", "A", "Z"], "covariates": ["cov1"]},
        ),
    ],
 )
@pytest.mark.parametrize("freq", ["s", "min", "30min", "h", "D", "W", "ME", "QE", "YE"])
-def test_predict_df_works_for_valid_inputs(pipeline, context_setup, future_setup, expected_rows, freq):
-    prediction_length = 3
+@pytest.mark.parametrize("prediction_length", [1, 4])
+@pytest.mark.parametrize("validate_inputs", [True, False])
+def test_predict_df_works_for_valid_inputs(
+    pipeline, context_setup, future_setup, freq, validate_inputs, prediction_length
+):
    df = create_df(**context_setup, freq=freq)
    forecast_start_times = get_forecast_start_times(df, freq)
-    future_df = create_future_df(forecast_start_times, **future_setup, freq=freq) if future_setup else None
+    if future_setup:
+        series_ids = future_setup.get("series_ids", ["A", "B"])
+        future_setup_with_n_points = {**future_setup, "n_points": [prediction_length] * len(series_ids)}
+        future_df = create_future_df(forecast_start_times, **future_setup_with_n_points, freq=freq)
+    else:
+        future_df = None

    series_ids = context_setup.get("series_ids", ["A", "B"])
    target_columns = context_setup.get("target_cols", ["target"])
    n_series = len(series_ids)
    n_targets = len(target_columns)
-    result = pipeline.predict_df(df, future_df=future_df, target=target_columns, prediction_length=prediction_length)
+    result = pipeline.predict_df(
+        df,
+        future_df=future_df,
+        target=target_columns,
+        prediction_length=prediction_length,
+        validate_inputs=validate_inputs,
+    )

+    expected_rows = n_series * n_targets * prediction_length
    assert len(result) == expected_rows
    assert "item_id" in result.columns and np.all(
        result["item_id"].to_numpy() == np.array(series_ids).repeat(n_targets * prediction_length)
@ -512,9 +522,10 @@ def test_predict_df_future_df_validation_errors(pipeline, future_data, error_mat
        pipeline.predict_df(df, future_df=future_df)


-def test_predict_df_with_non_uniform_timestamps_raises_error(pipeline):
+@pytest.mark.parametrize("validate_inputs", [True, False])
+def test_predict_df_with_non_uniform_timestamps_raises_error(pipeline, validate_inputs):
    df = create_df()
-    # Make timestamps non-uniform for series A
+    # Make timestamps non-uniform for series A (first series)
    df.loc[df["item_id"] == "A", "timestamp"] = [
        "2023-01-01",
        "2023-01-02",
@ -528,8 +539,8 @@ def test_predict_df_with_non_uniform_timestamps_raises_error(pipeline):
        "2023-01-11",
    ]

-    with pytest.raises(ValueError, match="not infer frequency"):
-        pipeline.predict_df(df)
+    with pytest.raises((ValueError, AssertionError), match="not infer frequency"):
+        pipeline.predict_df(df, validate_inputs=validate_inputs)


 def test_predict_df_with_inconsistent_frequencies_raises_error(pipeline):
@ -566,26 +577,80 @@ def test_predict_df_with_future_df_missing_series_raises_error(pipeline):
        pipeline.predict_df(df, future_df=future_df)


-def test_predict_df_with_future_df_with_different_lengths_raises_error(pipeline):
-    df = create_df(series_ids=["A", "B"], covariates=["cov1"])
-    future_df = create_future_df(
-        get_forecast_start_times(df), series_ids=["A", "B"], n_points=[3, 7], covariates=["cov1"]
-    )
-
-    with pytest.raises(ValueError, match="all time series must have length"):
-        pipeline.predict_df(df, future_df=future_df, prediction_length=3)
-
-
 def test_predict_df_with_future_df_with_different_freq_raises_error(pipeline):
    df = create_df(series_ids=["A", "B"], covariates=["cov1"], freq="h")
    future_df = create_future_df(
        get_forecast_start_times(df), series_ids=["A", "B"], n_points=[3, 3], covariates=["cov1"], freq="D"
    )

-    with pytest.raises(ValueError, match="must have the same frequency as context"):
+    with pytest.raises(ValueError, match="future_df timestamps do not match"):
        pipeline.predict_df(df, future_df=future_df, prediction_length=3)


+def test_predict_df_with_future_df_with_different_lengths_raises_error(pipeline):
+    df = create_df(series_ids=["A", "B"], covariates=["cov1"])
+    future_df = create_future_df(
+        get_forecast_start_times(df), series_ids=["A", "B"], n_points=[3, 7], covariates=["cov1"]
+    )
+
+    with pytest.raises(ValueError, match="future_df must contain prediction"):
+        pipeline.predict_df(df, future_df=future_df, prediction_length=3)
+
+
+@pytest.mark.parametrize(
+    "context_setup, future_setup",
+    [
+        # Targets only
+        ({}, None),
+        # Multiple targets with different context lengths
+        ({"target_cols": ["sales", "revenue", "profit"], "n_points": [10, 17]}, None),
+        # With past covariates
+        ({"covariates": ["cov1"]}, None),
+        # With future covariates
+        ({"covariates": ["cov1"]}, {"covariates": ["cov1"]}),
+        # With past-only and future covariates
+        ({"covariates": ["cov1", "cov2"]}, {"covariates": ["cov1"]}),
+        # With past-only and future covariates and different series order
+        (
+            {"series_ids": ["B", "C", "A", "Z"], "n_points": [10, 20, 100, 256], "covariates": ["cov1", "cov2"]},
+            {"series_ids": ["B", "C", "A", "Z"], "covariates": ["cov1"]},
+        ),
+    ],
+)
+@pytest.mark.parametrize("prediction_length", [1, 4])
+def test_predict_df_outputs_different_results_with_cross_learning_enabled(
+    pipeline, context_setup, future_setup, prediction_length
+):
+    freq = "h"
+    df = create_df(**context_setup, freq=freq)
+    forecast_start_times = get_forecast_start_times(df, freq)
+    if future_setup:
+        series_ids = future_setup.get("series_ids", ["A", "B"])
+        future_setup_with_n_points = {**future_setup, "n_points": [prediction_length] * len(series_ids)}
+        future_df = create_future_df(forecast_start_times, **future_setup_with_n_points, freq=freq)
+    else:
+        future_df = None
+
+    series_ids = context_setup.get("series_ids", ["A", "B"])
+    target_columns = context_setup.get("target_cols", ["target"])
+    result_with_cross_learning = pipeline.predict_df(
+        df,
+        future_df=future_df,
+        target=target_columns,
+        prediction_length=prediction_length,
+        cross_learning=True,
+    )
+    result_without_cross_learning = pipeline.predict_df(
+        df,
+        future_df=future_df,
+        target=target_columns,
+        prediction_length=prediction_length,
+        cross_learning=False,
+    )
+
+    assert not np.array_equal(result_with_cross_learning["predictions"], result_without_cross_learning["predictions"])
+
+
@pytest.mark.parametrize(
    "inputs, prediction_length, expected_output_shapes",
    [
@ -671,12 +736,20 @@ def test_predict_df_with_future_df_with_different_freq_raises_error(pipeline):
        ),
    ],
 )
+@pytest.mark.parametrize("finetune_mode", ["full", "lora"])
 def test_when_input_is_valid_then_pipeline_can_be_finetuned(
-    pipeline, inputs, prediction_length, expected_output_shapes
+    pipeline, inputs, prediction_length, expected_output_shapes, finetune_mode
 ):
    # Get outputs before fine-tuning
    orig_outputs_before = pipeline.predict(inputs, prediction_length=prediction_length)
-    ft_pipeline = pipeline.fit(inputs, prediction_length=prediction_length, num_steps=5, min_past=1, batch_size=32)
+    ft_pipeline = pipeline.fit(
+        inputs,
+        prediction_length=prediction_length,
+        num_steps=5,
+        min_past=1,
+        batch_size=32,
+        finetune_mode=finetune_mode,
+    )
    # Get outputs from fine-tuned pipeline
    ft_outputs = ft_pipeline.predict(inputs, prediction_length=prediction_length)
    # Get outputs from original pipeline after fine-tuning
@ -852,40 +925,36 @@ def test_when_input_time_series_are_too_short_then_finetuning_raises_error(pipel


@pytest.mark.parametrize(
-    "context_setup, future_setup, expected_rows",
+    "context_setup, future_setup",
    [
        # Targets only
-        ({}, None, 6),  # 2 series * 3 predictions
+        ({}, None),
        # Multiple targets with different context lengths
-        (
-            {"target_cols": ["sales", "revenue", "profit"], "n_points": [10, 17]},
-            None,
-            18,
-        ),  # 2 series * 3 targets * 3 predictions
+        ({"target_cols": ["sales", "revenue", "profit"], "n_points": [10, 17]}, None),
        # With past covariates
-        ({"covariates": ["cov1"]}, None, 6),
+        ({"covariates": ["cov1"]}, None),
        # With future covariates
-        ({"covariates": ["cov1"]}, {"covariates": ["cov1"], "n_points": [3, 3]}, 6),
+        ({"covariates": ["cov1"]}, {"covariates": ["cov1"]}),
        # With past-only and future covariates
-        ({"covariates": ["cov1", "cov2"]}, {"covariates": ["cov1"], "n_points": [3, 3]}, 6),
+        ({"covariates": ["cov1", "cov2"]}, {"covariates": ["cov1"]}),
        # With past-only and future covariates and different series order
        (
            {"series_ids": ["B", "C", "A", "Z"], "n_points": [10, 20, 100, 256], "covariates": ["cov1", "cov2"]},
-            {
-                "series_ids": ["B", "C", "A", "Z"],
-                "covariates": ["cov1"],
-                "n_points": [3, 3, 3, 3],
-            },
-            12,
+            {"series_ids": ["B", "C", "A", "Z"], "covariates": ["cov1"]},
        ),
    ],
 )
@pytest.mark.parametrize("freq", ["h", "D", "ME"])
-def test_two_step_finetuning_with_df_input_works(pipeline, context_setup, future_setup, expected_rows, freq):
+def test_two_step_finetuning_with_df_input_works(pipeline, context_setup, future_setup, freq):
    prediction_length = 3
    df = create_df(**context_setup, freq=freq)
    forecast_start_times = get_forecast_start_times(df, freq)
-    future_df = create_future_df(forecast_start_times, **future_setup, freq=freq) if future_setup else None
+    if future_setup:
+        series_ids = future_setup.get("series_ids", ["A", "B"])
+        future_setup_with_n_points = {**future_setup, "n_points": [prediction_length] * len(series_ids)}
+        future_df = create_future_df(forecast_start_times, **future_setup_with_n_points, freq=freq)
+    else:
+        future_df = None

    series_ids = context_setup.get("series_ids", ["A", "B"])
    target_columns = context_setup.get("target_cols", ["target"])
@ -918,6 +987,7 @@ def test_two_step_finetuning_with_df_input_works(pipeline, context_setup, future
    )

    # Check predictions from the fine-tuned model are valid
+    expected_rows = n_series * n_targets * prediction_length
    assert len(result) == expected_rows
    assert "item_id" in result.columns and np.all(
        result["item_id"].to_numpy() == np.array(series_ids).repeat(n_targets * prediction_length)
@ -936,3 +1006,164 @@ def test_two_step_finetuning_with_df_input_works(pipeline, context_setup, future

    # Check predictions from the fine-tuned model are different from the original predictions
    assert not np.allclose(orig_result_before["predictions"].to_numpy(), result["predictions"].to_numpy())
+
+
+def test_when_predict_df_called_with_timeout_callback_then_timeout_error_is_raised(pipeline):
+    num_series = 1000
+    large_df = create_df(series_ids=[j for j in range(num_series)], n_points=[2048] * num_series)
+    with pytest.raises(TimeoutError, match="time limit exceeded"):
+        pipeline.predict_df(
+            large_df,
+            prediction_length=48,
+            after_batch=timeout_callback(0.1),
+        )
+
+
+@pytest.mark.parametrize("attn_implementation", ["eager", "sdpa"])
+def test_pipeline_works_with_different_attention_implementations(attn_implementation):
+    """Test that the pipeline works with different attention implementations."""
+    # Load the dummy model
+    model_path = Path(__file__).parent / "dummy-chronos2-model"
+
+    # Load with specified attention implementation
+    pipeline = BaseChronosPipeline.from_pretrained(
+        model_path, device_map="cpu", attn_implementation=attn_implementation
+    )
+
+    # Verify the config has the correct attention implementation
+    assert pipeline.model.config._attn_implementation == attn_implementation
+
+    # Test prediction with simple input
+    inputs = torch.rand(2, 1, 16)
+    prediction_length = 7
+
+    outputs = pipeline.predict(inputs, prediction_length=prediction_length)
+
+    # Check outputs are valid
+    assert isinstance(outputs, list) and len(outputs) == 2
+    for out in outputs:
+        validate_tensor(out, (1, DEFAULT_MODEL_NUM_QUANTILES, 7), dtype=torch.float32)
+
+
+@pytest.mark.parametrize("attn_implementation", ["eager", "sdpa"])
+@pytest.mark.parametrize("output_attentions", [False, True])
+def test_attention_implementations_with_output_attentions(attn_implementation, output_attentions):
+    """Test that attention implementations handle output_attentions correctly."""
+    # Create config with specified attention implementation
+    config = Chronos2CoreConfig(
+        d_model=128,
+        d_kv=32,
+        num_heads=4,
+        dropout_rate=0.1,
+        attn_implementation=attn_implementation,
+    )
+
+    # Create MHA layer
+    mha = MHA(config, use_rope=True)
+    mha.eval()
+
+    # Create dummy inputs
+    batch_size = 2
+    seq_len = 10
+    hidden_states = torch.randn(batch_size, seq_len, config.d_model)
+    position_ids = torch.arange(seq_len).unsqueeze(0).expand(batch_size, -1)
+    mask = torch.zeros(batch_size, config.num_heads, seq_len, seq_len)
+
+    # Test forward pass
+    output = mha(
+        hidden_states=hidden_states,
+        mask=mask,
+        position_ids=position_ids,
+        output_attentions=output_attentions,
+    )
+
+    # Check output shape
+    assert output.hidden_states.shape == (batch_size, seq_len, config.d_model)
+
+    # Check attention weights - should only be returned when output_attentions=True
+    if output_attentions:
+        assert output.attn_weights is not None
+        assert output.attn_weights.shape == (batch_size, config.num_heads, seq_len, seq_len)
+    else:
+        # SDPA doesn't return weights
+        if attn_implementation == "sdpa":
+            assert output.attn_weights is None
+
+
+def test_eager_and_sdpa_produce_identical_outputs(pipeline):
+    """Test that eager and SDPA implementations produce identical outputs on full pipeline."""
+    # Reload pipeline with SDPA
+    model_path = Path(__file__).parent / "dummy-chronos2-model"
+    pipeline_sdpa = BaseChronosPipeline.from_pretrained(
+        model_path, device_map="cpu", attn_implementation="sdpa", torch_dtype=torch.float32
+    )
+
+    # Note: the original pipeline fixture uses default attn_implementation which should be sdpa
+    # Force eager for comparison
+    pipeline_eager = BaseChronosPipeline.from_pretrained(
+        model_path, device_map="cpu", attn_implementation="eager", torch_dtype=torch.float32
+    )
+
+    # Test 1: Simple univariate input
+    inputs_simple = torch.rand(2, 1, 16)
+    prediction_length = 7
+
+    with torch.no_grad():
+        outputs_eager = pipeline_eager.predict(inputs_simple, prediction_length=prediction_length)
+        outputs_sdpa = pipeline_sdpa.predict(inputs_simple, prediction_length=prediction_length)
+
+    # Verify outputs match exactly
+    assert len(outputs_eager) == len(outputs_sdpa)
+    for out_eager, out_sdpa in zip(outputs_eager, outputs_sdpa):
+        # Should match exactly or very close (numerical precision)
+        assert torch.allclose(out_eager, out_sdpa, atol=1e-5, rtol=1e-4)
+
+    # Test 2: Multivariate inputs with covariates to test group attention
+    inputs_grouped = [
+        {
+            "target": np.random.randn(2, 36),
+            "past_covariates": {
+                "temperature": np.random.randn(36),
+                "weather_type": np.random.choice(["sunny", "cloudy", "rainy"], size=36),
+            },
+            "future_covariates": {
+                "temperature": np.random.randn(prediction_length),
+                "weather_type": np.random.choice(["sunny", "cloudy", "rainy"], size=prediction_length),
+            },
+        }
+        for _ in range(5)
+    ]
+
+    with torch.no_grad():
+        outputs_eager_grouped = pipeline_eager.predict(inputs_grouped, prediction_length=prediction_length)
+        outputs_sdpa_grouped = pipeline_sdpa.predict(inputs_grouped, prediction_length=prediction_length)
+
+    # Verify outputs match for grouped inputs
+    assert len(outputs_eager_grouped) == len(outputs_sdpa_grouped)
+    for out_eager, out_sdpa in zip(outputs_eager_grouped, outputs_sdpa_grouped):
+        # Should match exactly or very close (numerical precision)
+        assert torch.allclose(out_eager, out_sdpa, atol=1e-5, rtol=1e-4)
+
+
+def test_pipeline_can_be_finetuned_with_preprocessed_hf_dataset(pipeline):
+    """Test that fine-tuning works with preprocessed inputs from a HuggingFace Dataset."""
+    from chronos.chronos2.dataset import prepare_inputs
+
+    prediction_length = 8
+    raw_inputs = [{"target": torch.rand(20)}, {"target": torch.rand(25)}, {"target": torch.rand(30)}]
+
+    # Preprocess and convert to HF Dataset (simulating Arrow-based lazy loading)
+    prepared_tasks = prepare_inputs(raw_inputs, prediction_length=prediction_length, min_past=1, mode="train")
+    hf_dataset = datasets.Dataset.from_list(prepared_tasks).with_format("torch")
+
+    # Fine-tune with preprocessed inputs
+    ft_pipeline = pipeline.fit(
+        hf_dataset, prediction_length=prediction_length, num_steps=5, min_past=1, batch_size=32, convert_inputs=False
+    )
+
+    # Verify fine-tuned model can predict
+    ft_outputs = ft_pipeline.predict(raw_inputs, prediction_length=prediction_length)
+    assert len(ft_outputs) == len(raw_inputs)
+    for ft_out in ft_outputs:
+        assert ft_out.shape == (1, DEFAULT_MODEL_NUM_QUANTILES, prediction_length)
+        assert not torch.isnan(ft_out).any()
--- a/test/test_chronos_bolt.py
+++ b/test/test_chronos_bolt.py
@ -5,12 +5,21 @@ from pathlib import Path

 import datasets
 import fev
+import numpy as np
+import pandas as pd
 import pytest
 import torch

 from chronos import BaseChronosPipeline, ChronosBoltPipeline
 from chronos.chronos_bolt import InstanceNorm, Patch
-from test.util import validate_tensor
+from test.util import create_df, get_forecast_start_times, validate_tensor
+
+DUMMY_MODEL_PATH = Path(__file__).parent / "dummy-chronos-bolt-model"
+
+
+@pytest.fixture
+def pipeline() -> ChronosBoltPipeline:
+    return BaseChronosPipeline.from_pretrained(DUMMY_MODEL_PATH, device_map="cpu")


 def test_base_chronos_pipeline_loads_from_huggingface():
@ -20,11 +29,7 @@ def test_base_chronos_pipeline_loads_from_huggingface():
@pytest.mark.parametrize("torch_dtype", [torch.float32, torch.bfloat16])
@pytest.mark.parametrize("input_dtype", [torch.float32, torch.bfloat16, torch.int64])
 def test_pipeline_predict(torch_dtype: torch.dtype, input_dtype: torch.dtype):
-    pipeline = ChronosBoltPipeline.from_pretrained(
-        Path(__file__).parent / "dummy-chronos-bolt-model",
-        device_map="cpu",
-        torch_dtype=torch_dtype,
-    )
+    pipeline = ChronosBoltPipeline.from_pretrained(DUMMY_MODEL_PATH, device_map="cpu", torch_dtype=torch_dtype)
    context = 10 * torch.rand(size=(4, 16)) + 10
    context = context.to(dtype=input_dtype)
    expected_num_quantiles = len(pipeline.quantiles)
@ -84,11 +89,7 @@ def test_pipeline_predict_quantiles(
    prediction_length: int,
    quantile_levels: list[float],
 ):
-    pipeline = ChronosBoltPipeline.from_pretrained(
-        Path(__file__).parent / "dummy-chronos-bolt-model",
-        device_map="cpu",
-        torch_dtype=torch_dtype,
-    )
+    pipeline = ChronosBoltPipeline.from_pretrained(DUMMY_MODEL_PATH, device_map="cpu", torch_dtype=torch_dtype)
    context = 10 * torch.rand(size=(4, 16)) + 10
    context = context.to(dtype=input_dtype)

@ -127,11 +128,7 @@ def test_pipeline_predict_quantiles(
@pytest.mark.parametrize("model_dtype", [torch.float32, torch.bfloat16])
@pytest.mark.parametrize("input_dtype", [torch.float32, torch.bfloat16, torch.int64])
 def test_pipeline_embed(model_dtype: torch.dtype, input_dtype: torch.dtype):
-    pipeline = ChronosBoltPipeline.from_pretrained(
-        Path(__file__).parent / "dummy-chronos-bolt-model",
-        device_map="cpu",
-        torch_dtype=model_dtype,
-    )
+    pipeline = ChronosBoltPipeline.from_pretrained(DUMMY_MODEL_PATH, device_map="cpu", torch_dtype=model_dtype)
    d_model = pipeline.model.config.d_model
    context = 10 * torch.rand(size=(4, 16)) + 10
    context = context.to(dtype=input_dtype)
@ -160,6 +157,88 @@ def test_pipeline_embed(model_dtype: torch.dtype, input_dtype: torch.dtype):
    validate_tensor(loc_scale[1], shape=(1,), dtype=torch.float32)


+@pytest.mark.parametrize(
+    "context_setup, expected_rows",
+    [
+        # Targets only
+        ({}, 6),  # 2 series * 3 predictions
+        # Different context lengths
+        (
+            {"series_ids": ["X", "Y", "Z"], "n_points": [10, 17, 56], "target_cols": ["custom_target"]},
+            9,
+        ),  # 3 series * 3 predictions
+    ],
+)
+@pytest.mark.parametrize("freq", ["s", "min", "30min", "h", "D", "W", "ME", "QE", "YE"])
+def test_predict_df_works_for_valid_inputs(pipeline, context_setup, expected_rows, freq):
+    prediction_length = 3
+    df = create_df(**context_setup, freq=freq)
+    forecast_start_times = get_forecast_start_times(df, freq)
+
+    series_ids = context_setup.get("series_ids", ["A", "B"])
+    target_columns = context_setup.get("target_cols", ["target"])
+    n_series = len(series_ids)
+    n_targets = len(target_columns)
+    result = pipeline.predict_df(df, target=target_columns[0], prediction_length=prediction_length)
+
+    assert len(result) == expected_rows
+    assert "item_id" in result.columns and np.all(
+        result["item_id"].to_numpy() == np.array(series_ids).repeat(n_targets * prediction_length)
+    )
+    assert "target_name" in result.columns and np.all(
+        result["target_name"].to_numpy() == np.tile(np.array(target_columns).repeat(prediction_length), n_series)
+    )
+    assert "timestamp" in result.columns and np.all(
+        result.groupby("item_id")["timestamp"].min().to_numpy() == pd.to_datetime(forecast_start_times).to_numpy()
+    )
+    assert "predictions" in result.columns
+    assert all(str(q) in result.columns for q in [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
+
+
+def test_predict_df_with_non_uniform_timestamps_raises_error(pipeline):
+    df = create_df()
+    # Make timestamps non-uniform for series A
+    df.loc[df["item_id"] == "A", "timestamp"] = [
+        "2023-01-01",
+        "2023-01-02",
+        "2023-01-04",
+        "2023-01-05",
+        "2023-01-06",
+        "2023-01-07",
+        "2023-01-08",
+        "2023-01-09",
+        "2023-01-10",
+        "2023-01-11",
+    ]
+
+    with pytest.raises(ValueError, match="not infer frequency"):
+        pipeline.predict_df(df)
+
+
+def test_predict_df_with_inconsistent_frequencies_raises_error(pipeline):
+    df = pd.DataFrame(
+        {
+            "item_id": ["A", "A", "A", "A", "A", "B", "B", "B", "B", "B"],
+            "timestamp": [
+                "2023-01-01",
+                "2023-01-02",
+                "2023-01-03",
+                "2023-01-04",
+                "2023-01-05",
+                "2023-01-01",
+                "2023-02-01",
+                "2023-03-01",
+                "2023-04-01",
+                "2023-05-01",
+            ],
+            "target": [1.0] * 10,
+        }
+    )
+
+    with pytest.raises(ValueError, match="same frequency"):
+        pipeline.predict_df(df)
+
+
 # The following tests have been taken from
 # https://github.com/autogluon/autogluon/blob/f57beb26cb769c6e0d484a6af2b89eab8aee73a8/timeseries/tests/unittests/models/chronos/pipeline/test_chronos_bolt.py
 # Author: Caner Turkmen <atturkm@amazon.com>
--- a/test/test_df_utils.py
+++ b/test/test_df_utils.py
@ -0,0 +1,462 @@
+# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+from unittest.mock import patch
+
+import numpy as np
+import pandas as pd
+import pytest
+
+from chronos.df_utils import (
+    convert_df_input_to_list_of_dicts_input,
+    validate_df_inputs,
+)
+from test.util import create_df, create_future_df, get_forecast_start_times
+
+# Tests for validate_df_inputs function
+
+
+@pytest.mark.parametrize("freq", ["s", "min", "30min", "h", "D", "W", "ME", "QE", "YE"])
+def test_validate_df_inputs_returns_correct_metadata_for_valid_inputs(freq):
+    """Test that function returns validated dataframes, frequency, series lengths, and original order."""
+    # Create test data with 2 series
+    df = create_df(series_ids=["A", "B"], n_points=[10, 15], target_cols=["target"], freq=freq)
+
+    # Call validate_df_inputs
+    validated_df, validated_future_df, inferred_freq, series_lengths, original_order = validate_df_inputs(
+        df=df,
+        future_df=None,
+        target_columns=["target"],
+        prediction_length=5,
+        id_column="item_id",
+        timestamp_column="timestamp",
+    )
+
+    # Verify key return values
+    assert validated_future_df is None
+    assert inferred_freq is not None
+    assert series_lengths == [10, 15]
+    assert list(original_order) == ["A", "B"]
+    # Verify dataframe is sorted
+    assert validated_df["item_id"].iloc[0] == "A"
+    assert validated_df["item_id"].iloc[10] == "B"
+
+
+def test_validate_df_inputs_casts_mixed_dtypes_correctly():
+    """Test that numeric columns are cast to float32 and categorical/string/object columns are cast to category."""
+    # Create dataframe with mixed column types
+    df = pd.DataFrame(
+        {
+            "item_id": ["A"] * 10,
+            "timestamp": pd.date_range(end="2001-10-01", periods=10, freq="h"),
+            "target": np.random.randn(10),  # numeric
+            "numeric_cov": np.random.randint(0, 10, 10),  # integer numeric
+            "string_cov": ["cat1"] * 5 + ["cat2"] * 5,  # string
+            "bool_cov": [True, False] * 5,  # boolean
+        }
+    )
+
+    # Call validate_df_inputs
+    validated_df, _, _, _, _ = validate_df_inputs(
+        df=df,
+        future_df=None,
+        target_columns=["target"],
+        prediction_length=5,
+    )
+
+    # Verify dtypes after validation
+    assert validated_df["target"].dtype == np.float32
+    assert validated_df["numeric_cov"].dtype == np.float32
+    assert validated_df["string_cov"].dtype.name == "category"
+    assert validated_df["bool_cov"].dtype == np.float32  # booleans are cast to float32
+
+
+def test_validate_df_inputs_raises_error_when_series_has_insufficient_data():
+    """Test that ValueError is raised for series with < 3 data points."""
+    # Create dataframe with one series having only 2 points
+    df = create_df(series_ids=["A", "B"], n_points=[10, 2], target_cols=["target"], freq="h")
+
+    # Verify error is raised with series ID in message
+    with pytest.raises(ValueError, match=r"Every time series must have at least 3 data points.*series B"):
+        validate_df_inputs(
+            df=df,
+            future_df=None,
+            target_columns=["target"],
+            prediction_length=5,
+        )
+
+
+def test_validate_df_inputs_raises_error_when_future_df_has_mismatched_series_ids():
+    """Test that ValueError is raised when future_df has different series IDs than df."""
+    # Create df with series A and B
+    df = create_df(series_ids=["A", "B"], n_points=[10, 15], target_cols=["target"], freq="h")
+
+    # Create future_df with only series A
+    forecast_start_times = get_forecast_start_times(df, freq="h")
+    future_df = create_future_df(
+        forecast_start_times=[forecast_start_times[0]], series_ids=["A"], n_points=[5], covariates=None, freq="h"
+    )
+
+    # Verify appropriate error is raised
+    with pytest.raises(ValueError, match=r"future_df must contain the same time series IDs as df"):
+        validate_df_inputs(
+            df=df,
+            future_df=future_df,
+            target_columns=["target"],
+            prediction_length=5,
+        )
+
+
+def test_validate_df_inputs_raises_error_when_future_df_has_incorrect_lengths():
+    """Test that ValueError is raised when future_df lengths don't match prediction_length."""
+    # Create df with series A and B with a covariate
+    df = create_df(series_ids=["A", "B"], n_points=[10, 13], target_cols=["target"], covariates=["cov1"], freq="h")
+
+    # Create future_df with varying lengths per series (3 and 7 instead of 5)
+    forecast_start_times = get_forecast_start_times(df, freq="h")
+    future_df = create_future_df(
+        forecast_start_times=forecast_start_times,
+        series_ids=["A", "B"],
+        n_points=[3, 7],  # incorrect lengths
+        covariates=["cov1"],
+        freq="h",
+    )
+
+    # Verify error message indicates which series have incorrect lengths
+    with pytest.raises(
+        ValueError, match=r"future_df must contain prediction_length=5 values for each series.*different lengths"
+    ):
+        validate_df_inputs(
+            df=df,
+            future_df=future_df,
+            target_columns=["target"],
+            prediction_length=5,
+        )
+
+
+# Tests for convert_df_input_to_list_of_dicts_input function
+
+
+def test_convert_df_with_single_target_preserves_values():
+    """Test conversion with single target column."""
+    df = create_df(series_ids=["A", "B"], n_points=[10, 12], target_cols=["target"], freq="h")
+
+    inputs, original_order, prediction_timestamps = convert_df_input_to_list_of_dicts_input(
+        df=df,
+        future_df=None,
+        target_columns=["target"],
+        prediction_length=5,
+    )
+
+    # Verify output list has correct length (one per series)
+    assert len(inputs) == 2
+
+    # Verify target arrays have correct shape and values match input
+    assert inputs[0]["target"].shape == (1, 10)  # (n_targets=1, n_timesteps=10)
+    assert inputs[1]["target"].shape == (1, 12)  # (n_targets=1, n_timesteps=12)
+
+    # Verify values are preserved
+    df_sorted = df.sort_values(["item_id", "timestamp"])
+    np.testing.assert_array_almost_equal(
+        inputs[0]["target"][0], df_sorted[df_sorted["item_id"] == "A"]["target"].values
+    )
+    np.testing.assert_array_almost_equal(
+        inputs[1]["target"][0], df_sorted[df_sorted["item_id"] == "B"]["target"].values
+    )
+
+
+def test_convert_df_with_multiple_targets_preserves_values_and_shape():
+    """Test conversion with multiple target columns."""
+    df = create_df(series_ids=["A", "B"], n_points=[10, 14], target_cols=["target1", "target2"], freq="h")
+
+    inputs, original_order, prediction_timestamps = convert_df_input_to_list_of_dicts_input(
+        df=df,
+        future_df=None,
+        target_columns=["target1", "target2"],
+        prediction_length=5,
+    )
+
+    # Verify target arrays have shape (n_targets, n_timesteps)
+    assert inputs[0]["target"].shape == (2, 10)
+    assert inputs[1]["target"].shape == (2, 14)
+
+    # Verify all target values are preserved for both series
+    df_sorted = df.sort_values(["item_id", "timestamp"])
+    for i, series_id in enumerate(["A", "B"]):
+        series_data = df_sorted[df_sorted["item_id"] == series_id]
+        np.testing.assert_array_almost_equal(inputs[i]["target"][0], series_data["target1"].values)
+        np.testing.assert_array_almost_equal(inputs[i]["target"][1], series_data["target2"].values)
+
+
+def test_convert_df_with_past_covariates_includes_them_in_output():
+    """Test conversion with past covariates only."""
+    df = create_df(
+        series_ids=["A", "B"], n_points=[10, 16], target_cols=["target"], covariates=["cov1", "cov2"], freq="h"
+    )
+
+    inputs, original_order, prediction_timestamps = convert_df_input_to_list_of_dicts_input(
+        df=df,
+        future_df=None,
+        target_columns=["target"],
+        prediction_length=5,
+    )
+
+    # Verify output includes past_covariates dictionary
+    assert "past_covariates" in inputs[0]
+    assert "cov1" in inputs[0]["past_covariates"]
+    assert "cov2" in inputs[0]["past_covariates"]
+
+    # Verify covariate values match input for both series
+    assert inputs[0]["past_covariates"]["cov1"].shape == (10,)
+    assert inputs[0]["past_covariates"]["cov2"].shape == (10,)
+    assert inputs[1]["past_covariates"]["cov1"].shape == (16,)
+    assert inputs[1]["past_covariates"]["cov2"].shape == (16,)
+
+    # Verify no future_covariates key in output
+    assert "future_covariates" not in inputs[0]
+
+
+def test_convert_df_with_past_and_future_covariates_includes_both():
+    """Test conversion with both past and future covariates."""
+    df = create_df(series_ids=["A", "B"], n_points=[10, 18], target_cols=["target"], covariates=["cov1"], freq="h")
+
+    forecast_start_times = get_forecast_start_times(df, freq="h")
+    future_df = create_future_df(
+        forecast_start_times=forecast_start_times,
+        series_ids=["A", "B"],
+        n_points=[5, 5],
+        covariates=["cov1"],
+        freq="h",
+    )
+
+    inputs, original_order, prediction_timestamps = convert_df_input_to_list_of_dicts_input(
+        df=df,
+        future_df=future_df,
+        target_columns=["target"],
+        prediction_length=5,
+    )
+
+    # Verify output includes both past_covariates and future_covariates dictionaries for both series
+    assert "past_covariates" in inputs[0]
+    assert "future_covariates" in inputs[0]
+    assert "past_covariates" in inputs[1]
+    assert "future_covariates" in inputs[1]
+
+    # Verify all covariate values are preserved with correct shapes
+    assert inputs[0]["past_covariates"]["cov1"].shape == (10,)
+    assert inputs[0]["future_covariates"]["cov1"].shape == (5,)
+    assert inputs[1]["past_covariates"]["cov1"].shape == (18,)
+    assert inputs[1]["future_covariates"]["cov1"].shape == (5,)
+
+
+@pytest.mark.parametrize("freq", ["s", "min", "30min", "h", "D", "W", "ME", "QE", "YE"])
+def test_convert_df_generates_prediction_timestamps_with_correct_frequency(freq):
+    """Test that prediction timestamps follow the inferred frequency."""
+    # Use multiple series with irregular lengths
+    df = create_df(series_ids=["A", "B", "C"], n_points=[10, 15, 12], target_cols=["target"], freq=freq)
+
+    inputs, original_order, prediction_timestamps = convert_df_input_to_list_of_dicts_input(
+        df=df,
+        future_df=None,
+        target_columns=["target"],
+        prediction_length=5,
+    )
+
+    # Verify timestamps for all series
+    for series_id in ["A", "B", "C"]:
+        # Verify timestamps start after last context timestamp
+        last_context_time = df[df["item_id"] == series_id]["timestamp"].max()
+        first_pred_time = prediction_timestamps[series_id][0]
+        assert first_pred_time > last_context_time
+
+        # Verify timestamps are evenly spaced according to frequency
+        pred_times = prediction_timestamps[series_id]
+        assert len(pred_times) == 5
+        inferred_freq = pd.infer_freq(pred_times)
+        assert inferred_freq is not None
+
+
+def test_convert_df_skips_validation_when_disabled():
+    """Test that validate_inputs=False skips validation."""
+    df = create_df(series_ids=["A", "B"], n_points=[10, 12], target_cols=["target"], freq="h")
+
+    # Mock validate_df_inputs to verify it's not called when validation is disabled
+    with patch("chronos.df_utils.validate_df_inputs") as mock_validate:
+        inputs, original_order, prediction_timestamps = convert_df_input_to_list_of_dicts_input(
+            df=df,
+            future_df=None,
+            target_columns=["target"],
+            prediction_length=5,
+            validate_inputs=False,
+        )
+
+        # Verify validate_df_inputs was not called
+        mock_validate.assert_not_called()
+
+        # Verify conversion still works
+        assert len(inputs) == 2
+
+
+def test_convert_df_preserves_all_values_with_random_inputs():
+    """Generate random dataframe and verify all values are preserved exactly."""
+    # Generate random parameters
+    n_series = np.random.randint(2, 5)
+    n_targets = np.random.randint(1, 4)
+    n_past_only_covariates = np.random.randint(1, 3)
+    n_future_covariates = np.random.randint(1, 3)
+    prediction_length = 5
+
+    series_ids = [f"series_{i}" for i in range(n_series)]
+    n_points = [np.random.randint(10, 20) for _ in range(n_series)]
+    target_cols = [f"target_{i}" for i in range(n_targets)]
+    past_only_covariates = [f"past_cov_{i}" for i in range(n_past_only_covariates)]
+    future_covariates = [f"future_cov_{i}" for i in range(n_future_covariates)]
+    all_covariates = past_only_covariates + future_covariates
+
+    # Create dataframe with all covariates
+    df = create_df(
+        series_ids=series_ids, n_points=n_points, target_cols=target_cols, covariates=all_covariates, freq="h"
+    )
+
+    # Create future_df with only future covariates (not past-only ones)
+    forecast_start_times = get_forecast_start_times(df, freq="h")
+    future_df = create_future_df(
+        forecast_start_times=forecast_start_times,
+        series_ids=series_ids,
+        n_points=[prediction_length] * n_series,
+        covariates=future_covariates,
+        freq="h",
+    )
+
+    # Convert to list-of-dicts format
+    inputs, original_order, prediction_timestamps = convert_df_input_to_list_of_dicts_input(
+        df=df,
+        future_df=future_df,
+        target_columns=target_cols,
+        prediction_length=prediction_length,
+    )
+
+    # Verify all target values are preserved exactly
+    df_sorted = df.sort_values(["item_id", "timestamp"])
+    for i, series_id in enumerate(series_ids):
+        series_data = df_sorted[df_sorted["item_id"] == series_id]
+        assert inputs[i]["target"].shape == (n_targets, n_points[i])
+
+        for j, target_col in enumerate(target_cols):
+            np.testing.assert_array_almost_equal(inputs[i]["target"][j], series_data[target_col].values)
+
+    # Verify all past covariate values are preserved (both past-only and future covariates)
+    for i, series_id in enumerate(series_ids):
+        series_data = df_sorted[df_sorted["item_id"] == series_id]
+        assert "past_covariates" in inputs[i]
+        for cov in all_covariates:
+            np.testing.assert_array_almost_equal(inputs[i]["past_covariates"][cov], series_data[cov].values)
+
+    # Verify only future covariates are in future_covariates (not past-only ones)
+    future_df_sorted = future_df.sort_values(["item_id", "timestamp"])
+    for i, series_id in enumerate(series_ids):
+        series_future_data = future_df_sorted[future_df_sorted["item_id"] == series_id]
+        assert "future_covariates" in inputs[i]
+        # Only future covariates should be present
+        assert set(inputs[i]["future_covariates"].keys()) == set(future_covariates)
+        for cov in future_covariates:
+            np.testing.assert_array_almost_equal(inputs[i]["future_covariates"][cov], series_future_data[cov].values)
+
+    # Verify output structure is correct
+    assert len(inputs) == n_series
+    assert list(original_order) == series_ids
+    assert len(prediction_timestamps) == n_series
+
+
+def test_convert_df_with_freq_and_validate_inputs_raises_error():
+    """Test that providing freq with validate_inputs=True raises ValueError."""
+    df = create_df(series_ids=["A", "B"], n_points=[10, 12], target_cols=["target"], freq="h")
+
+    with pytest.raises(ValueError, match="freq can only be provided when validate_inputs=False"):
+        convert_df_input_to_list_of_dicts_input(
+            df=df,
+            future_df=None,
+            target_columns=["target"],
+            prediction_length=5,
+            freq="h",
+            validate_inputs=True,
+        )
+
+
+@pytest.mark.parametrize("use_future_df", [True, False])
+def test_convert_df_with_freq_and_validate_inputs_false(use_future_df):
+    """Test that freq works with validate_inputs=False."""
+    df = create_df(series_ids=["A", "B"], n_points=[10, 12], target_cols=["target"], covariates=["cov1"], freq="h")
+    prediction_length = 5
+
+    future_df = None
+    if use_future_df:
+        forecast_start_times = get_forecast_start_times(df, freq="h")
+        future_df = create_future_df(
+            forecast_start_times=forecast_start_times,
+            series_ids=["A", "B"],
+            n_points=[prediction_length, prediction_length],
+            covariates=["cov1"],
+            freq="h",
+        )
+
+    inputs, original_order, prediction_timestamps = convert_df_input_to_list_of_dicts_input(
+        df=df,
+        future_df=future_df,
+        target_columns=["target"],
+        prediction_length=prediction_length,
+        freq="h",
+        validate_inputs=False,
+    )
+
+    assert len(inputs) == 2
+    assert len(prediction_timestamps) == 2
+    for series_id in ["A", "B"]:
+        assert len(prediction_timestamps[series_id]) == prediction_length
+
+
+@pytest.mark.parametrize("use_future_df", [True, False])
+def test_convert_df_with_mismatched_freq_uses_user_provided_freq(use_future_df):
+    """Test that user-provided freq overrides data frequency when validate_inputs=False."""
+    # Create data with hourly frequency
+    data_freq = "h"
+    df = create_df(
+        series_ids=["A", "B"], n_points=[10, 12], target_cols=["target"], covariates=["cov1"], freq=data_freq
+    )
+    prediction_length = 5
+
+    # User provides daily frequency (different from data)
+    user_freq = "D"
+
+    future_df = None
+    if use_future_df:
+        # Create future_df with hourly frequency (matching data, not user freq)
+        forecast_start_times = get_forecast_start_times(df, freq=data_freq)
+        future_df = create_future_df(
+            forecast_start_times=forecast_start_times,
+            series_ids=["A", "B"],
+            n_points=[prediction_length, prediction_length],
+            covariates=["cov1"],
+            freq=data_freq,
+        )
+
+    inputs, original_order, prediction_timestamps = convert_df_input_to_list_of_dicts_input(
+        df=df,
+        future_df=future_df,
+        target_columns=["target"],
+        prediction_length=prediction_length,
+        freq=user_freq,
+        validate_inputs=False,
+    )
+
+    # Prediction should work
+    assert len(inputs) == 2
+    assert len(prediction_timestamps) == 2
+
+    # Forecast timestamps should use user-provided freq (daily), not data freq (hourly)
+    for series_id in ["A", "B"]:
+        pred_ts = prediction_timestamps[series_id]
+        assert len(pred_ts) == prediction_length
+        # Verify the frequency matches user-provided freq
+        inferred_freq = pd.infer_freq(pred_ts)
+        assert inferred_freq == user_freq
--- a/test/util.py
+++ b/test/util.py
@ -1,5 +1,8 @@
-from typing import Optional, Tuple
+import time
+from typing import Callable, Optional, Tuple

+import numpy as np
+import pandas as pd
 import torch


@ -9,3 +12,47 @@ def validate_tensor(a: torch.Tensor, shape: Tuple[int, ...], dtype: Optional[tor

    if dtype is not None:
        assert a.dtype == dtype
+
+
+def create_df(series_ids=["A", "B"], n_points=[10, 10], target_cols=["target"], covariates=None, freq="h"):
+    """Helper to create test context DataFrames."""
+    series_dfs = []
+    for series_id, length in zip(series_ids, n_points):
+        series_data = {"item_id": series_id, "timestamp": pd.date_range(end="2001-10-01", periods=length, freq=freq)}
+        for target_col in target_cols:
+            series_data[target_col] = np.random.randn(length)
+        if covariates:
+            for cov in covariates:
+                series_data[cov] = np.random.randn(length)
+        series_dfs.append(pd.DataFrame(series_data))
+    return pd.concat(series_dfs, ignore_index=True)
+
+
+def create_future_df(forecast_start_times: list, series_ids=["A", "B"], n_points=[5, 5], covariates=None, freq="h"):
+    """Helper to create test future DataFrames."""
+    series_dfs = []
+    for series_id, length, start in zip(series_ids, n_points, forecast_start_times):
+        series_data = {"item_id": series_id, "timestamp": pd.date_range(start=start, periods=length, freq=freq)}
+        if covariates:
+            for cov in covariates:
+                series_data[cov] = np.random.randn(length)
+        series_dfs.append(pd.DataFrame(series_data))
+    return pd.concat(series_dfs, ignore_index=True)
+
+
+def get_forecast_start_times(df, freq="h"):
+    context_end_times = df.groupby("item_id")["timestamp"].max()
+    forecast_start_times = [pd.date_range(end_time, periods=2, freq=freq)[-1] for end_time in context_end_times]
+
+    return forecast_start_times
+
+
+def timeout_callback(seconds: float | None) -> Callable:
+    """Return a callback object that raises an exception if time limit is exceeded."""
+    start_time = time.monotonic()
+
+    def callback() -> None:
+        if seconds is not None and time.monotonic() - start_time > seconds:
+            raise TimeoutError("time limit exceeded")
+
+    return callback