Merge 7ae10d6f27 into 32111085d8

2026-05-23 09:39:35 +00:00 · 2026-05-09 07:45:39 +08:00 · 2026-05-09 07:45:39 +08:00 · 30f1763e00
commit 30f1763e00
parent 32111085d8 7ae10d6f27
10 changed files with 925 additions and 169 deletions
--- a/docs/api/base.md
+++ b/docs/api/base.md
@ -0,0 +1,16 @@
+# Base
+
+::: chronos.base.BaseChronosPipeline
+    options:
+      show_source: true
+      heading_level: 2
+      show_root_heading: true
+      show_root_toc_entry: true
+      members:
+        - predict
+        - predict_quantiles
+        - predict_df
+        - predict_fev
+        - from_pretrained
+        - model_context_length
+        - model_prediction_length
--- a/docs/api/chronos-2.md
+++ b/docs/api/chronos-2.md
@ -0,0 +1,25 @@
+# Chronos
+
+::: chronos.chronos2.pipeline.Chronos2Pipeline
+    options:
+      show_source: true
+      heading_level: 2
+      show_root_heading: true
+      show_root_toc_entry: true
+      members: null
+      members:
+        - predict
+        - predict_quantiles
+        - predict_df
+        - predict_fev
+        - embed
+        - fit
+        - from_pretrained
+        - save_pretrained
+        - model_context_length
+        - model_prediction_length
+        - model_output_patch_size
+        - quantiles
+        - max_output_patches
+        - model
+        - forecast_type
--- a/docs/api/chronos-bolt.md
+++ b/docs/api/chronos-bolt.md
@ -0,0 +1,18 @@
+# Chronos
+
+::: chronos.chronos_bolt.ChronosBoltPipeline
+    options:
+      show_source: true
+      heading_level: 2
+      show_root_heading: true
+      show_root_toc_entry: true
+      members: null
+      members:
+        - predict
+        - predict_quantiles
+        - embed
+        - from_pretrained
+        - model_context_length
+        - model_prediction_length
+        - model
+        - forecast_type
--- a/docs/api/chronos.md
+++ b/docs/api/chronos.md
@ -0,0 +1,18 @@
+# Chronos
+
+::: chronos.chronos.ChronosPipeline
+    options:
+      show_source: true
+      heading_level: 2
+      show_root_heading: true
+      show_root_toc_entry: true
+      members:
+        - predict
+        - predict_quantiles
+        - embed
+        - from_pretrained
+        - model_context_length
+        - model_prediction_length
+        - tokenizer
+        - model
+        - forecast_type
--- a/docs/index.md
+++ b/docs/index.md
@ -0,0 +1,89 @@
+# chronos-forecasting
+
+
+## Introduction
+
+This package provides an interface to the Chronos family of **pretrained time series forecasting models**. The following model types are supported.
+
+- **Chronos-2**: Our latest model with significantly enhanced capabilities. It offers zero-shot support for univariate, multivariate, and covariate-informed forecasting tasks. Chronos-2 delivers state-of-the-art zero-shot performance across multiple benchmarks (including fev-bench and GIFT-Eval), with the largest improvements observed on tasks that include exogenous features. It also achieves a win rate of over 90% against Chronos-Bolt in head-to-head comparisons. To learn more about Chronos, check out the [technical report](https://arxiv.org/abs/2510.15821).
+- **Chronos-Bolt**: A patch-based variant of Chronos. It chunks the historical time series context into patches of multiple observations, which are then input into the encoder. The decoder then uses these representations to directly generate quantile forecasts across multiple future steps—a method known as direct multi-step forecasting. Chronos-Bolt models are up to 250 times faster and 20 times more memory-efficient than the original Chronos models of the same size. To learn more about Chronos-Bolt, check out this [blog post](https://aws.amazon.com/blogs/machine-learning/fast-and-accurate-zero-shot-forecasting-with-chronos-bolt-and-autogluon/).
+- **Chronos**: The original Chronos family which is based on language model architectures. A time series is transformed into a sequence of tokens via scaling and quantization, and a language model is trained on these tokens using the cross-entropy loss. Once trained, probabilistic forecasts are obtained by sampling multiple future trajectories given the historical context. To learn more about Chronos, check out the [publication](https://openreview.net/forum?id=gerNCVqqtR).
+
+### Available Models
+
+| Model ID                                                               | Parameters |
+| ---------------------------------------------------------------------- | ---------- |
+| [`amazon/chronos-2`](https://huggingface.co/amazon/chronos-2)   | 120M         |
+| [`amazon/chronos-bolt-tiny`](https://huggingface.co/amazon/chronos-bolt-tiny)   | 9M         |
+| [`amazon/chronos-bolt-mini`](https://huggingface.co/amazon/chronos-bolt-mini)   | 21M        |
+| [`amazon/chronos-bolt-small`](https://huggingface.co/amazon/chronos-bolt-small) | 48M        |
+| [`amazon/chronos-bolt-base`](https://huggingface.co/amazon/chronos-bolt-base)   | 205M       |
+| [`amazon/chronos-t5-tiny`](https://huggingface.co/amazon/chronos-t5-tiny)   | 8M         |
+| [`amazon/chronos-t5-mini`](https://huggingface.co/amazon/chronos-t5-mini)   | 20M        |
+| [`amazon/chronos-t5-small`](https://huggingface.co/amazon/chronos-t5-small) | 46M        |
+| [`amazon/chronos-t5-base`](https://huggingface.co/amazon/chronos-t5-base)   | 200M       |
+| [`amazon/chronos-t5-large`](https://huggingface.co/amazon/chronos-t5-large) | 710M       |
+
+
+
+
+## Installation
+
+```bash
+pip install chronos-forecasting
+```
+
+## Quickstart
+
+A minimal example showing how to perform forecasting using Chronos-2:
+
+```py
+import pandas as pd  # requires: pip install 'pandas[pyarrow]'
+from chronos import Chronos2Pipeline
+
+pipeline = Chronos2Pipeline.from_pretrained("amazon/chronos-2", device_map="cuda")
+
+# Load historical target values and past values of covariates
+context_df = pd.read_parquet("https://autogluon.s3.amazonaws.com/datasets/timeseries/electricity_price/train.parquet")
+
+# (Optional) Load future values of covariates
+test_df = pd.read_parquet("https://autogluon.s3.amazonaws.com/datasets/timeseries/electricity_price/test.parquet")
+future_df = test_df.drop(columns="target")
+
+# Generate predictions with covariates
+pred_df = pipeline.predict_df(
+    context_df,
+    future_df=future_df,
+    prediction_length=24,  # Number of steps to forecast
+    quantile_levels=[0.1, 0.5, 0.9],  # Quantile for probabilistic forecast
+    id_column="id",  # Column identifying different time series
+    timestamp_column="timestamp",  # Column with datetime information
+    target="target",  # Column(s) with time series values to predict
+)
+```
+
+We can now visualize the forecast:
+
+```py
+import matplotlib.pyplot as plt  # requires: pip install matplotlib
+
+ts_context = context_df.set_index("timestamp")["target"].tail(256)
+ts_pred = pred_df.set_index("timestamp")
+ts_ground_truth = test_df.set_index("timestamp")["target"]
+
+ts_context.plot(label="historical data", color="xkcd:azure", figsize=(12, 3))
+ts_ground_truth.plot(label="future data (ground truth)", color="xkcd:grass green")
+ts_pred["predictions"].plot(label="forecast", color="xkcd:violet")
+plt.fill_between(
+    ts_pred.index,
+    ts_pred["0.1"],
+    ts_pred["0.9"],
+    alpha=0.7,
+    label="prediction interval",
+    color="xkcd:light lavender",
+)
+plt.legend()
+```
+
+## Tutorials
+
--- a/mkdocs.yml
+++ b/mkdocs.yml
@ -0,0 +1,40 @@
+site_name: chronos-forecasting
+site_description: A library for the Chronos time series foundation models
+site_url: https://amazon-science.github.io/chronos-forecasting/
+repo_url: https://github.com/amazon-science/chronos-forecasting
+repo_name: amazon-science/chronos-forecasting
+
+theme:
+  name: material
+  palette:
+    primary: blue
+
+plugins:
+  - search
+  - mkdocstrings:
+      handlers:
+        python:
+          options:
+            docstring_style: numpy
+            show_root_heading: true
+            show_root_full_path: false
+            show_signature_annotations: true
+            show_category_heading: true
+            group_by_category: true
+            allow_inspection: true
+
+nav:
+  - Home: index.md
+  # - Tutorials:
+  #   - Quickstart: 
+  - API Reference:
+    - Base: api/base.md
+    - Chronos: api/chronos.md
+    - Chronos-Bolt: api/chronos-bolt.md
+    - Chronos-2: api/chronos-2.md
+
+markdown_extensions:
+  - pymdownx.highlight
+  - pymdownx.superfences
+  - attr_list
+  - md_in_html
--- a/src/chronos/base.py
+++ b/src/chronos/base.py
@ -42,25 +42,63 @@ class PipelineRegistry(type):


 class BaseChronosPipeline(metaclass=PipelineRegistry):
+    """
+    Abstract base class for Chronos pretrained time series forecasting pipelines.
+
+    This class defines the common interface for all Chronos models. The package provides
+    multiple pipeline implementations with different forecasting approaches and architectures:
+
+    - [ChronosPipeline][chronos.chronos.ChronosPipeline]: Sample-based forecasting with scaling and quantization based tokenization
+    - [ChronosBoltPipeline][chronos.chronos_bolt.ChronosBoltPipeline]: Quantile-based forecasting with patching
+    - [Chronos2Pipeline][chronos.chronos2.pipeline.Chronos2Pipeline] (recommended): Quantile-based forecasting with support for multivariate and covariate-informed forecasting
+
+    Each subclass implements the abstract methods and properties defined here,
+    potentially with different parameter signatures and return types depending
+    on the model architecture and forecasting approach.
+    """
+
    forecast_type: ForecastType
    dtypes = {"bfloat16": torch.bfloat16, "float32": torch.float32}

    def __init__(self, inner_model: "PreTrainedModel"):
        """
+        Initialize the base pipeline with a pretrained model.
+
        Parameters
        ----------
-        inner_model : PreTrainedModel
-            A hugging-face transformers PreTrainedModel, e.g., T5ForConditionalGeneration
+        inner_model
+            A HuggingFace transformers PreTrainedModel that serves as the
+            underlying forecasting model (e.g., T5ForConditionalGeneration)
        """
        # for easy access to the inner HF-style model
        self.inner_model = inner_model

    @property
    def model_context_length(self) -> int:
+        """
+        Maximum number of time steps the model can use as context.
+
+        This is an abstract property that must be implemented by subclasses.
+
+        Returns
+        -------
+        int
+            Maximum context length supported by the model
+        """
        raise NotImplementedError()

    @property
    def model_prediction_length(self) -> int:
+        """
+        Default prediction horizon for the model.
+
+        This is an abstract property that must be implemented by subclasses.
+
+        Returns
+        -------
+        int
+            Default prediction horizon
+        """
        raise NotImplementedError()

    def _prepare_and_validate_context(self, context: Union[torch.Tensor, List[torch.Tensor]]):
@ -75,25 +113,35 @@ class BaseChronosPipeline(metaclass=PipelineRegistry):

    def predict(self, inputs: Union[torch.Tensor, List[torch.Tensor]], prediction_length: Optional[int] = None):
        """
-        Get forecasts for the given time series. Predictions will be
-        returned in fp32 on the cpu.
+        Generate forecasts for the given time series.
+
+        This is an abstract method that must be implemented by subclasses.
+        Each subclass may have different parameters and return types depending
+        on the model architecture and forecasting approach. Predictions are
+        typically returned in fp32 on the CPU.

        Parameters
        ----------
        inputs
-            Input series. This is either a 1D tensor, or a list
-            of 1D tensors, or a 2D tensor whose first dimension
-            is batch. In the latter case, use left-padding with
-            ``torch.nan`` to align series of different lengths.
+            Input time series. Can be a 1D tensor (single series), a list
+            of 1D tensors (multiple series of varying lengths), or a 2D tensor
+            where the first dimension is batch size. For 2D tensors, use
+            left-padding with torch.nan to align series of different lengths.
        prediction_length
-            Time steps to predict. Defaults to a model-dependent
-            value if not given.
+            Number of time steps to forecast. If not provided, defaults to
+            the model's default prediction length.

        Returns
        -------
-        forecasts
-            Tensor containing forecasts. The layout and meaning
-            of the forecasts values depends on ``self.forecast_type``.
+        torch.Tensor
+            Forecasts tensor. The shape and interpretation depend on the
+            subclass's forecast_type (samples or quantiles).
+
+        Notes
+        -----
+        Subclasses may extend this interface with additional parameters
+        specific to their forecasting approach. Refer to specific subclass
+        documentation for complete parameter lists and return value details.
        """
        raise NotImplementedError()

@ -105,30 +153,42 @@ class BaseChronosPipeline(metaclass=PipelineRegistry):
        **kwargs,
    ) -> Tuple[torch.Tensor, torch.Tensor]:
        """
-        Get quantile and mean forecasts for given time series.
-        Predictions will be returned in fp32 on the cpu.
+        Generate quantile and mean forecasts for given time series.
+
+        This is an abstract method that must be implemented by subclasses.
+        Each subclass may have different parameters depending on the model
+        architecture. Predictions are typically returned in fp32 on the CPU.

        Parameters
        ----------
-        inputs : Union[torch.Tensor, List[torch.Tensor]]
-            Input series. This is either a 1D tensor, or a list
-            of 1D tensors, or a 2D tensor whose first dimension
-            is batch. In the latter case, use left-padding with
-            ``torch.nan`` to align series of different lengths.
-        prediction_length : Optional[int], optional
-            Time steps to predict. Defaults to a model-dependent
-            value if not given.
-        quantile_levels : List[float], optional
-            Quantile levels to compute, by default [0.1, 0.2, ..., 0.9]
+        inputs
+            Input time series. Can be a 1D tensor (single series), a list
+            of 1D tensors (multiple series of varying lengths), or a 2D tensor
+            where the first dimension is batch size. For 2D tensors, use
+            left-padding with torch.nan to align series of different lengths.
+        prediction_length
+            Number of time steps to forecast. If not provided, defaults to
+            the model's default prediction length.
+        quantile_levels
+            List of quantile levels to compute, each between 0 and 1.
+            Default is [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9].
+        **kwargs
+            Additional keyword arguments that may be used by subclass implementations.

        Returns
        -------
-        quantiles
-            Tensor containing quantile forecasts. Shape
+        torch.Tensor
+            Tensor of quantile forecasts with shape
            (batch_size, prediction_length, num_quantiles)
-        mean
-            Tensor containing mean (point) forecasts. Shape
+        torch.Tensor
+            Tensor of mean (point) forecasts with shape
            (batch_size, prediction_length)
+
+        Notes
+        -----
+        Subclasses may extend this interface with additional parameters
+        specific to their forecasting approach. Refer to specific subclass
+        documentation for complete parameter lists and implementation details.
        """
        raise NotImplementedError()

@ -146,7 +206,11 @@ class BaseChronosPipeline(metaclass=PipelineRegistry):
        **predict_kwargs,
    ) -> "pd.DataFrame":
        """
-        Perform forecasting on time series data in a long-format pandas DataFrame.
+        Generate forecasts for time series data in a pandas DataFrame.
+
+        This method provides a convenient interface for forecasting on long-format
+        pandas DataFrames containing multiple time series. It handles data conversion,
+        batching, and result formatting automatically.

        Parameters
        ----------
@ -178,12 +242,31 @@ class BaseChronosPipeline(metaclass=PipelineRegistry):

        Returns
        -------
-        The forecasts dataframe generated by the model with the following columns
-        - `id_column`: The time series ID
-        - `timestamp_column`: Future timestamps
-        - "target_name": The name of the target column
-        - "predictions": The point predictions generated by the model
-        - One column for predictions at each quantile level in `quantile_levels`
+        pd.DataFrame
+            Forecast results in long format with the following columns:
+
+            - Column named by id_column: Time series identifiers
+            - Column named by timestamp_column: Future timestamps
+            - "target_name": Name of the forecasted target variable
+            - "predictions": Point forecasts (mean predictions)
+            - One column per quantile level (e.g., "0.1", "0.5", "0.9")
+
+        Raises
+        ------
+        ImportError
+            If pandas is not installed.
+        ValueError
+            If target is not a string (multivariate forecasting not supported).
+
+        Notes
+        -----
+        This method requires pandas to be installed. Install with `pip install pandas`.
+
+        The method internally converts the DataFrame to tensor format, generates
+        forecasts using predict_quantiles, and converts results back to DataFrame format.
+
+        Subclasses may have additional parameters or behavior. Refer to specific
+        subclass documentation for implementation details.
        """
        try:
            import pandas as pd
@ -253,23 +336,43 @@ class BaseChronosPipeline(metaclass=PipelineRegistry):
        self, task: "fev.Task", batch_size: int = 32, **kwargs
    ) -> tuple[list["datasets.DatasetDict"], float]:
        """
-        Make predictions for evaluation on a fev.Task.
+        Generate predictions for evaluation on a fev benchmark task.
+
+        This method provides integration with the fev (Forecasting Evaluation)
+        library for standardized benchmark evaluation. It handles batching,
+        timing, and formatting predictions according to the task requirements.

        Parameters
        ----------
        task
-            Benchmark task on which the evaluation should be done.
+            A fev.Task object defining the benchmark evaluation task, including
+            the dataset, horizon, quantile levels, and evaluation metric.
        batch_size
-            Batch size used during evaluation.
+            Number of time series to process in each batch during inference.
+            Larger batch sizes may improve throughput but require more memory.
+            Default is 32.
        **kwargs
-            Additional keyword arguments that will be forwarded to `self.predict_quantiles`.
+            Additional keyword arguments forwarded to the predict_quantiles method.
+            These may include model-specific parameters.

        Returns
        -------
-        predictions_per_window
-            Predictions for each window, each stored as a DatasetDict
-        inference_time_s
-            Total time that it took to make predictions for all windows (in seconds).
+        list[DatasetDict]
+            List of DatasetDict objects, one for each evaluation window in the task.
+            Each DatasetDict contains predictions formatted according to fev requirements.
+        float
+            Total inference time in seconds across all windows, excluding data
+            loading and preprocessing time.
+
+        Raises
+        ------
+        ImportError
+            If the fev library is not installed.
+
+        Notes
+        -----
+        This method requires the fev library to be installed. Install with
+        `pip install fev`.
        """
        import datasets

@ -342,8 +445,57 @@ class BaseChronosPipeline(metaclass=PipelineRegistry):
        **kwargs,
    ):
        """
-        Load the model, either from a local path, S3 prefix, or from the HuggingFace Hub.
-        Supports the same arguments as ``AutoConfig`` and ``AutoModel`` from ``transformers``.
+        Load a pretrained Chronos pipeline from various sources.
+
+        This class method loads a pretrained model from a local path, S3 bucket,
+        or the HuggingFace Hub. It automatically detects the appropriate pipeline
+        class based on the model configuration and instantiates it.
+
+        Parameters
+        ----------
+        pretrained_model_name_or_path
+            Path or identifier for the pretrained model. Can be:
+            - A local directory path containing model files
+            - An S3 URI (s3://bucket/prefix)
+            - A HuggingFace Hub model identifier (e.g., "amazon/chronos-t5-small")
+        *model_args
+            Additional positional arguments passed to the model constructor.
+        force_s3_download
+            When True, forces re-downloading from S3 even if cached locally.
+            Only applicable for S3 URIs. Default is False.
+        **kwargs
+            Additional keyword arguments passed to AutoConfig and the model
+            constructor. Common options include:
+            - torch_dtype: Data type for model weights ("auto", "float32", "bfloat16")
+            - device_map: Device placement strategy for model layers
+            - Other transformers AutoConfig and AutoModel arguments
+
+        Returns
+        -------
+        BaseChronosPipeline
+            An instance of the appropriate pipeline subclass (ChronosPipeline,
+            ChronosBoltPipeline, or Chronos2Pipeline) based on the model configuration.
+
+        Raises
+        ------
+        ValueError
+            If the configuration is not a valid Chronos config or if the
+            specified pipeline class is not recognized.
+        ImportError
+            If required dependencies are not installed.
+
+        Notes
+        -----
+        The method reads the model configuration to determine which pipeline
+        class to instantiate. The configuration must contain either a
+        `chronos_pipeline_class` or `chronos_config` attribute.
+
+        For S3 URIs, the model is first downloaded to a local cache directory
+        before loading.
+
+        The torch_dtype parameter can be specified as a string ("float32", "bfloat16")
+        or as a torch dtype object. When set to "auto", the dtype is determined
+        from the model configuration.
        """
        if str(pretrained_model_name_or_path).startswith("s3://"):
            from .boto_utils import cache_model_from_s3
--- a/src/chronos/chronos.py
+++ b/src/chronos/chronos.py
@ -152,6 +152,19 @@ class ChronosTokenizer:


 class MeanScaleUniformBins(ChronosTokenizer):
+    """
+    A tokenizer which first applies mean scaling and then quantizes the scaled values in uniformly-spaced bins.
+
+    Parameters
+    ----------
+    low_limit
+        The lower limit of quantization. (Scaled) Values smaller than this will be clipped.
+    high_limit
+        The upper limit of quantization. (Scaled) Values larger than this will be clipped.
+    config
+        The ``ChronosConfig``
+    """
+
    def __init__(self, low_limit: float, high_limit: float, config: ChronosConfig) -> None:
        self.config = config
        self.centers = torch.linspace(
@ -354,18 +367,22 @@ class ChronosModel(nn.Module):

 class ChronosPipeline(BaseChronosPipeline):
    """
-    A ``ChronosPipeline`` uses the given tokenizer and model to forecast
-    input time series.
+    Pipeline for the Chronos model.
+    
+    To learn more about this model, refer to:
+
+    Ansari, Abdul Fatir, Stella, Lorenzo et al.
+    "[Chronos: Learning the Language of Time Series](https://arxiv.org/abs/2403.07815)."
+    Transactions on Machine Learning Research (2024).

-    Use the ``from_pretrained`` class method to load serialized models.
-    Use the ``predict`` method to get forecasts.

    Parameters
    ----------
    tokenizer
-        The tokenizer object to use.
+        ChronosTokenizer instance that handles conversion between time series
+        values and discrete tokens.
    model
-        The model to use.
+        ChronosModel instance wrapping the underlying transformer model.
    """

    tokenizer: ChronosTokenizer
@ -373,6 +390,17 @@ class ChronosPipeline(BaseChronosPipeline):
    forecast_type: ForecastType = ForecastType.SAMPLES

    def __init__(self, tokenizer, model):
+        """
+        Initialize the ChronosPipeline with a tokenizer and model.
+
+        Parameters
+        ----------
+        tokenizer
+            ChronosTokenizer instance for converting between time series
+            and token representations.
+        model
+            ChronosModel instance containing the pretrained transformer model.
+        """
        super().__init__(inner_model=model.model)
        self.tokenizer = tokenizer
        self.model = model
@ -398,26 +426,39 @@ class ChronosPipeline(BaseChronosPipeline):
    @torch.no_grad()
    def embed(self, context: Union[torch.Tensor, List[torch.Tensor]]) -> Tuple[torch.Tensor, Any]:
        """
-        Get encoder embeddings for the given time series.
+        Extract encoder embeddings for the given time series.
+
+        This method tokenizes the input time series and extracts the encoder
+        embeddings, which can be used for downstream tasks like clustering,
+        classification, or similarity search. Only available for encoder-decoder
+        (seq2seq) models.

        Parameters
        ----------
        context
-            Input series. This is either a 1D tensor, or a list
-            of 1D tensors, or a 2D tensor whose first dimension
-            is batch. In the latter case, use left-padding with
-            ``torch.nan`` to align series of different lengths.
+            Input time series. Can be a 1D tensor (single series), a list
+            of 1D tensors (multiple series of varying lengths), or a 2D tensor
+            where the first dimension is batch size. For 2D tensors, use
+            left-padding with torch.nan to align series of different lengths.

        Returns
        -------
-        embeddings, tokenizer_state
-            A tuple of two tensors: the encoder embeddings and the tokenizer_state,
-            e.g., the scale of the time series in the case of mean scaling.
-            The encoder embeddings are shaped (batch_size, context_length, d_model)
-            or (batch_size, context_length + 1, d_model), where context_length
-            is the size of the context along the time axis if a 2D tensor was provided
-            or the length of the longest time series, if a list of 1D tensors was
-            provided, and the extra 1 is for EOS.
+        torch.Tensor
+            Encoder embeddings with shape (batch_size, context_length, d_model)
+            or (batch_size, context_length + 1, d_model) if EOS token is used.
+            The context_length is either the time dimension of the input 2D tensor
+            or the length of the longest series in the input list.
+        Any
+            Tokenizer state containing scaling information (e.g., mean scale)
+            used during tokenization. Can be used for consistent processing
+            of related time series.
+
+        Notes
+        -----
+        This method is only supported for encoder-decoder (seq2seq) models.
+        Decoder-only (causal) models do not have a separate encoder.
+
+        The embeddings are returned on CPU in fp32 format.
        """
        context_tensor = self._prepare_and_validate_context(context=context)
        token_ids, attention_mask, tokenizer_state = self.tokenizer.context_input_transform(context_tensor)
@ -438,36 +479,64 @@ class ChronosPipeline(BaseChronosPipeline):
        limit_prediction_length: bool = False,
    ) -> torch.Tensor:
        """
-        Get forecasts for the given time series.
+        Generate sample-based forecasts for the given time series.

-        Refer to the base method (``BaseChronosPipeline.predict``)
-        for details on shared parameters.
+        This method tokenizes the input time series, generates multiple sample
+        trajectories using the transformer model, and decodes them back to real
+        values. For predictions longer than the model's built-in horizon, it uses
+        autoregressive generation by feeding back the median of generated samples.

-        Additional parameters
-        ---------------------
+        Parameters
+        ----------
+        inputs
+            Input time series. Can be a 1D tensor (single series), a list
+            of 1D tensors (multiple series of varying lengths), or a 2D tensor
+            where the first dimension is batch size. For 2D tensors, use
+            left-padding with torch.nan to align series of different lengths.
+        prediction_length
+            Number of time steps to forecast. If not provided, uses the model's
+            default prediction length from the configuration.
        num_samples
-            Number of sample paths to predict. Defaults to what
-            specified in ``self.model.config``.
+            Number of sample trajectories to generate for each input series.
+            If not provided, uses the value from model configuration.
        temperature
-            Temperature to use for generating sample tokens.
-            Defaults to what specified in ``self.model.config``.
+            Sampling temperature for token generation. Higher values increase
+            randomness. If not provided, uses the value from model configuration.
        top_k
-            Top-k parameter to use for generating sample tokens.
-            Defaults to what specified in ``self.model.config``.
+            Number of highest probability tokens to consider during sampling.
+            If not provided, uses the value from model configuration.
        top_p
-            Top-p parameter to use for generating sample tokens.
-            Defaults to what specified in ``self.model.config``.
+            Cumulative probability threshold for nucleus sampling. Only tokens
+            with cumulative probability up to top_p are considered.
+            If not provided, uses the value from model configuration.
        limit_prediction_length
-            Force prediction length smaller or equal than the
-            built-in prediction length from the model. False by
-            default. When true, fail loudly if longer predictions
-            are requested, otherwise longer predictions are allowed.
+            When True, raises an error if prediction_length exceeds the model's
+            built-in prediction length. When False (default), allows longer
+            predictions with a warning about potential quality degradation.

        Returns
        -------
-        samples
-            Tensor of sample forecasts, of shape
-            (batch_size, num_samples, prediction_length).
+        torch.Tensor
+            Sample forecasts with shape (batch_size, num_samples, prediction_length).
+            Returned in fp32 on CPU.
+
+        Raises
+        ------
+        ValueError
+            If limit_prediction_length is True and prediction_length exceeds
+            the model's built-in prediction length.
+
+        Notes
+        -----
+        For predictions longer than the model's built-in horizon, the method
+        uses autoregressive generation by iteratively:
+
+        1. Generating samples for the next chunk
+        2. Taking the median across samples
+        3. Appending it to the context for the next iteration
+
+        This autoregressive approach may lead to quality degradation for very
+        long horizons, as the model was not explicitly trained for this.
        """
        context_tensor = self._prepare_and_validate_context(context=inputs)

@ -518,7 +587,49 @@ class ChronosPipeline(BaseChronosPipeline):
        **predict_kwargs,
    ) -> Tuple[torch.Tensor, torch.Tensor]:
        """
-        Refer to the base method (``BaseChronosPipeline.predict_quantiles``).
+        Generate quantile and mean forecasts from sample trajectories.
+
+        This method first generates multiple sample trajectories using the predict
+        method, then computes empirical quantiles and mean from these samples.
+        This provides a convenient interface for obtaining quantile forecasts from
+        the model.
+
+        Parameters
+        ----------
+        inputs
+            Input time series. Can be a 1D tensor (single series), a list
+            of 1D tensors (multiple series of varying lengths), or a 2D tensor
+            where the first dimension is batch size. For 2D tensors, use
+            left-padding with torch.nan to align series of different lengths.
+        prediction_length
+            Number of time steps to forecast. If not provided, uses the model's
+            default prediction length from the configuration.
+        quantile_levels
+            List of quantile levels to compute, each between 0 and 1.
+            Default is [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9].
+        **predict_kwargs
+            Additional keyword arguments passed to the predict method, such as
+            num_samples, temperature, top_k, top_p, and limit_prediction_length.
+
+        Returns
+        -------
+        torch.Tensor
+            Tensor of quantile forecasts with shape
+            (batch_size, prediction_length, num_quantiles).
+            Returned in fp32 on CPU.
+        torch.Tensor
+            Tensor of mean forecasts with shape
+            (batch_size, prediction_length).
+            Returned in fp32 on CPU.
+
+        Notes
+        -----
+        The quantiles are computed empirically from the generated samples.
+        The accuracy of quantile estimates depends on the number of samples
+        generated (controlled by num_samples parameter in predict_kwargs).
+
+        For better quantile estimates, consider increasing num_samples, though
+        this will increase memory usage and computation time.
        """
        prediction_samples = (
            self.predict(inputs, prediction_length=prediction_length, **predict_kwargs).detach().swapaxes(1, 2)
@ -535,8 +646,49 @@ class ChronosPipeline(BaseChronosPipeline):
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, *args, **kwargs):
        """
-        Load the model, either from a local path S3 prefix or from the HuggingFace Hub.
-        Supports the same arguments as ``AutoConfig`` and ``AutoModel`` from ``transformers``.
+        Load a pretrained ChronosPipeline from various sources.
+
+        This method loads a pretrained ChronosPipeline model from a local path,
+        S3 bucket, or the HuggingFace Hub. It automatically instantiates the
+        appropriate tokenizer and model based on the configuration.
+
+        Parameters
+        ----------
+        pretrained_model_name_or_path
+            Path or identifier for the pretrained model. Can be:
+            - A local directory path containing model files
+            - An S3 URI (s3://bucket/prefix)
+            - A HuggingFace Hub model identifier (e.g., "amazon/chronos-t5-small")
+        *args
+            Additional positional arguments passed to AutoConfig and AutoModel.
+        **kwargs
+            Additional keyword arguments passed to AutoConfig and AutoModel.
+            Common options include:
+            - torch_dtype: Data type for model weights ("auto", "float32", "bfloat16")
+            - device_map: Device placement strategy for model layers
+            - Other transformers AutoConfig and AutoModel arguments
+
+        Returns
+        -------
+        ChronosPipeline
+            An instance of ChronosPipeline with loaded tokenizer and model.
+
+        Raises
+        ------
+        AssertionError
+            If the configuration is not a valid Chronos config.
+
+        Notes
+        -----
+        For S3 URIs, the method delegates to BaseChronosPipeline.from_pretrained
+        which handles S3 download and caching.
+
+        The method automatically detects whether to load a seq2seq or causal
+        model based on the configuration and instantiates the appropriate
+        model class.
+
+        This method supports all arguments accepted by HuggingFace's AutoConfig
+        and AutoModel classes.
        """

        if str(pretrained_model_name_or_path).startswith("s3://"):
--- a/src/chronos/chronos2/pipeline.py
+++ b/src/chronos/chronos2/pipeline.py
@ -37,10 +37,27 @@ logger = logging.getLogger(__name__)


 class Chronos2Pipeline(BaseChronosPipeline):
+    """
+    Pipeline for the Chronos-2 model.
+
+    To learn more about this model, refer to:
+
+    Ansari, Abdul Fatir, Shchur, Oleksandr, Küken, Jaris et al.
+    "[Chronos-2: From Univariate to Universal Forecasting](https://arxiv.org/abs/2510.15821)."
+
+    """
+
    forecast_type: ForecastType = ForecastType.QUANTILES
-    default_context_length: int = 2048

    def __init__(self, model: Chronos2Model):
+        """
+        Initialize the Chronos-2 pipeline with a pretrained model.
+
+        Parameters
+        ----------
+        model
+            A pretrained Chronos2Model instance
+        """
        super().__init__(inner_model=model)
        self.model = model

@ -55,13 +72,12 @@ class Chronos2Pipeline(BaseChronosPipeline):

        Parameters
        ----------
-        quantile_levels : torch.Tensor
+        quantile_levels
            The quantile levels, must be strictly in (0, 1)

        Returns
        -------
-        torch.Tensor
-            The normalized probability mass per quantile
+        The normalized probability mass per quantile
        """
        assert quantile_levels.ndim == 1
        assert quantile_levels.min() > 0.0 and quantile_levels.max() < 1.0
@ -75,22 +91,57 @@ class Chronos2Pipeline(BaseChronosPipeline):

    @property
    def model_context_length(self) -> int:
+        """
+        Maximum number of time steps the model can use as context.
+
+        Returns
+        -------
+        Maximum context length supported by the model
+        """
        return self.model.chronos_config.context_length

    @property
    def model_output_patch_size(self) -> int:
+        """
+        Size of each output patch produced by the model.
+
+        Returns
+        -------
+        Output patch size
+        """
        return self.model.chronos_config.output_patch_size

    @property
    def model_prediction_length(self) -> int:
+        """
+        Default prediction horizon for the model.
+
+        Returns
+        -------
+        Default prediction horizon (max_output_patches * output_patch_size)
+        """
        return self.model.chronos_config.max_output_patches * self.model.chronos_config.output_patch_size

    @property
    def quantiles(self) -> list[float]:
+        """
+        Quantile levels the model was trained to predict.
+
+        Returns
+        -------
+        List of quantile levels
+        """
        return self.model.chronos_config.quantiles

    @property
    def max_output_patches(self) -> int:
+        """
+        Maximum number of output patches the model can generate in a single forward pass.
+
+        Returns
+        -------
+        Maximum number of output patches
+        """
        return self.model.chronos_config.max_output_patches

    def fit(
@ -171,7 +222,9 @@ class Chronos2Pipeline(BaseChronosPipeline):

        Returns
        -------
-        A new `Chronos2Pipeline` with the fine-tuned model
+
+        Chronos2Pipeline
+            A new `Chronos2Pipeline` with the fine-tuned model
        """

        import torch.cuda
@ -559,8 +612,10 @@ class Chronos2Pipeline(BaseChronosPipeline):

        Returns
        -------
-        The model's predictions, a list of `torch.Tensor` where each element has shape (n_variates, n_quantiles, prediction_length) and the number of
-        elements are equal to the number of target time series (univariate or multivariate) in the `inputs`.
+
+        list[torch.Tensor]
+            The model's predictions, a list of `torch.Tensor` where each element has shape (n_variates, n_quantiles, prediction_length) and the number of
+            elements are equal to the number of target time series (univariate or multivariate) in the `inputs`.

        """
        model_prediction_length = self.model_prediction_length
@ -770,21 +825,23 @@ class Chronos2Pipeline(BaseChronosPipeline):
        **predict_kwargs,
    ) -> tuple[list[torch.Tensor], list[torch.Tensor]]:
        """
-        Refer to ``Chronos2Pipeline.predict`` for shared parameters.
+        Generate quantile and mean forecasts for given time series.

-        Additional parameters
+        Refer to `Chronos2Pipeline.predict` for shared parameters.
+
+        Parameters
        ---------------------
        quantile_levels
            Quantile levels to compute, by default [0.1, 0.2, ..., 0.9]

        Returns
        -------
-        quantiles
-            A list of torch tensors containing quantile forecasts. Each element of the list has shape (n_variates, prediction_length, len(quantile_levels))
-            and the number of elements are equal to the number of target time series (univariate or multivariate) in the `inputs`.
-        mean
-            A list of torch tensors containing containing mean (point) forecasts. Each element of the list has shape (n_variates, prediction_length)
-            and the number of elements are equal to the number of target time series (univariate or multivariate) in the `inputs`.
+        list[torch.Tensor]
+            A list of torch tensors containing quantile forecasts. Each element has shape (n_variates, prediction_length, len(quantile_levels))
+            and the number of elements equals the number of target time series (univariate or multivariate) in the inputs.
+        list[torch.Tensor]
+            A list of torch tensors containing mean (point) forecasts. Each element has shape (n_variates, prediction_length)
+            and the number of elements equals the number of target time series (univariate or multivariate) in the inputs.
        """
        training_quantile_levels = self.quantiles

@ -847,29 +904,31 @@ class Chronos2Pipeline(BaseChronosPipeline):
            Future covariates data with an id column, a timestamp, and any number of covariate columns,
            all of these columns will be treated as known future covariates
        id_column
-            The name of the column which contains the unique time series identifiers, by default "item_id"
+            The name of the column which contains the unique time series identifiers
        timestamp_column
-            The name of the column which contains timestamps, by default "timestamp"
-            All time series in the dataframe must have regular timestamps with the same frequency (no gaps)
+            The name of the column which contains timestamps. All time series in the dataframe must have
+            regular timestamps with the same frequency (no gaps)
        target
-            The name of the column(s) which contain the target variables to be forecasted, by default "target"
+            The name of the column(s) which contain the target variables to be forecasted
        prediction_length
            Number of steps to predict for each time series
        quantile_levels
            Quantile levels to compute
        batch_size
-            The batch size used for prediction. Note that the batch size here means the number of time series, including target(s) and covariates,
-            which are input into the model. If your data has multiple target and/or covariates, the effective number of time series tasks in a batch
-            will be lower than this value, by default 256
+            The batch size used for prediction. Note that the batch size here means the number of time series,
+            including target(s) and covariates, which are input into the model. If your data has multiple target
+            and/or covariates, the effective number of time series tasks in a batch will be lower than this value
        context_length
-            The maximum context length used during for inference, by default set to the model's default context length
+            The maximum context length used during inference, by default set to the model's default context length
        cross_learning
-            If True, cross-learning is enabled, i.e., all the tasks in `inputs` will be predicted jointly and the model will share information across all inputs, by default False
-            The following must be noted when using cross-learning:
+            If True, cross-learning is enabled, i.e., all the tasks in inputs will be predicted jointly and the
+            model will share information across all inputs. The following must be noted when using cross-learning:
            - Cross-learning doesn't always improve forecast accuracy and must be tested for individual use cases.
-            - Results become dependent on batch size. Very large batch sizes may not provide benefits as they deviate from the maximum group size used during pretraining.
-            For optimal results, consider using a batch size around 100 (as used in the Chronos-2 technical report).
-            - Cross-learning is most helpful when individual time series have limited historical context, as the model can leverage patterns from related series in the batch.
+            - Results become dependent on batch size. Very large batch sizes may not provide benefits as they
+            deviate from the maximum group size used during pretraining. For optimal results, consider using a
+            batch size around 100 (as used in the Chronos-2 technical report).
+            - Cross-learning is most helpful when individual time series have limited historical context, as the
+            model can leverage patterns from related series in the batch.
        validate_inputs
            [ADVANCED] When True (default), validates dataframes before prediction. Setting to False removes the
            validation overhead, but may silently lead to wrong predictions if data is misformatted. When False, you
@ -884,12 +943,15 @@ class Chronos2Pipeline(BaseChronosPipeline):

        Returns
        -------
-        The forecasts dataframe generated by the model with the following columns
-        - `id_column`: The time series ID
-        - `timestamp_column`: Future timestamps
-        - "target_name": The name of the target column
-        - "predictions": The point predictions generated by the model
-        - One column for predictions at each quantile level in `quantile_levels`
+
+        pd.DataFrame
+            The forecasts dataframe generated by the model with the following columns:
+
+            - id_column: The time series ID
+            - timestamp_column: Future timestamps
+            - "target_name": The name of the target column
+            - "predictions": The point predictions generated by the model
+            - One column for predictions at each quantile level in quantile_levels
        """
        try:
            import pandas as pd
@ -1051,9 +1113,9 @@ class Chronos2Pipeline(BaseChronosPipeline):

        Returns
        -------
-        predictions
+        list[DatasetDict]
            Predictions for each window, each stored as a DatasetDict
-        inference_time_s
+        float
            Total time that it took to make predictions for all windows (in seconds)
        """
        from chronos.chronos2.dataset import convert_fev_window_to_list_of_dicts_input
@ -1113,24 +1175,26 @@ class Chronos2Pipeline(BaseChronosPipeline):
        ----------
        inputs
            The time series to get embeddings for, can be one of:
-            - A 3-dimensional `torch.Tensor` or `np.ndarray` of shape (batch, n_variates, history_length). When `n_variates > 1`, information
-            will be shared among the different variates of each time series in the batch.
-            - A list of `torch.Tensor` or `np.ndarray` where each element can either be 1-dimensional of shape (history_length,)
-            or 2-dimensional of shape (n_variates, history_length). The history_lengths may be different across elements; left-padding
-            will be applied, if needed.
+            - A 3-dimensional torch.Tensor or np.ndarray of shape (batch, n_variates, history_length). When n_variates > 1,
+            information will be shared among the different variates of each time series in the batch.
+            - A list of torch.Tensor or np.ndarray where each element can either be 1-dimensional of shape (history_length,)
+            or 2-dimensional of shape (n_variates, history_length). The history_lengths may be different across elements;
+            left-padding will be applied, if needed.
        batch_size
-            The batch size used for generating embeddings. Note that the batch size here means the total number of time series which are input into the model.
-            If your data has multiple variates, the effective number of time series tasks in a batch will be lower than this value, by default 256
+            The batch size used for generating embeddings. Note that the batch size here means the total number of time series
+            which are input into the model. If your data has multiple variates, the effective number of time series tasks in a
+            batch will be lower than this value
        context_length
-            The maximum context length used during for inference, by default set to the model's default context length
+            The maximum context length used during inference, by default set to the model's default context length

        Returns
        -------
-        embeddings
-            a list of `torch.Tensor` where each element has shape (n_variates, num_patches + 2, d_model) and the number of elements are equal to the number
-            of target time series (univariate or multivariate) in the `inputs`. The extra +2 is due to embeddings of the [REG] token and a masked output patch token.
-        loc_scale
-            a list of tuples with the mean and standard deviation of each time series.
+        list[torch.Tensor]
+            A list of torch.Tensor where each element has shape (n_variates, num_patches + 2, d_model) and the number of
+            elements equals the number of target time series (univariate or multivariate) in the inputs. The extra +2 is due
+            to embeddings of the [REG] token and a masked output patch token.
+        list[tuple[torch.Tensor, torch.Tensor]]
+            A list of tuples with the mean and standard deviation of each time series.
        """
        if context_length is None:
            context_length = self.model_context_length
@ -1185,8 +1249,31 @@ class Chronos2Pipeline(BaseChronosPipeline):
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, *args, **kwargs):
        """
-        Load the model, either from a local path, S3 prefix or from the HuggingFace Hub.
-        Supports the same arguments as ``AutoConfig`` and ``AutoModel`` from ``transformers``.
+        Load the model from a local path, S3 prefix, or HuggingFace Hub.
+
+        Supports loading base models and LoRA adapters. When loading a LoRA adapter,
+        it will be automatically merged with the base model.
+
+        Parameters
+        ----------
+        pretrained_model_name_or_path
+            Path to the pretrained model. Can be:
+            - A local directory path
+            - An S3 URI (s3://...)
+            - A HuggingFace Hub model ID
+        *args
+            Additional positional arguments passed to AutoConfig and AutoModel
+        **kwargs
+            Additional keyword arguments passed to AutoConfig and AutoModel
+
+        Returns
+        -------
+        A Chronos2Pipeline instance with the loaded model
+
+        Notes
+        -----
+        Supports the same arguments as AutoConfig and AutoModel from transformers.
+        When loading LoRA adapters, the peft library must be installed.
        """

        # Check if the model is on S3 and cache it locally first
@ -1223,6 +1310,15 @@ class Chronos2Pipeline(BaseChronosPipeline):

    def save_pretrained(self, save_directory: str | Path, *args, **kwargs):
        """
-        Save the underlying model to a local directory or to HuggingFace Hub.
+        Save the underlying model to a local directory or HuggingFace Hub.
+
+        Parameters
+        ----------
+        save_directory
+            Directory where the model will be saved
+        *args
+            Additional positional arguments passed to the model's save_pretrained method
+        **kwargs
+            Additional keyword arguments passed to the model's save_pretrained method
        """
        self.model.save_pretrained(save_directory, *args, **kwargs)
--- a/src/chronos/chronos_bolt.py
+++ b/src/chronos/chronos_bolt.py
@ -401,10 +401,34 @@ class ChronosBoltModelForForecasting(T5PreTrainedModel):


 class ChronosBoltPipeline(BaseChronosPipeline):
+    """
+    Pipeline for the Chronos-Bolt model.
+
+    To learn more about this model, refer to:
+
+    Abdul Fatir Ansari, Caner Turkmen, Oleksandr Shchur, and Lorenzo Stella
+    "[Fast and accurate zero-shot forecasting with Chronos-Bolt and AutoGluon](https://aws.amazon.com/blogs/machine-learning/fast-and-accurate-zero-shot-forecasting-with-chronos-bolt-and-autogluon/)."
+    AWS Blogs (2024).
+
+    Parameters
+    ----------
+    model
+        `ChronosBoltModelForForecasting` instance containing the pretrained model.
+    """
+
    forecast_type: ForecastType = ForecastType.QUANTILES
    default_context_length: int = 2048

    def __init__(self, model: ChronosBoltModelForForecasting):
+        """
+        Initialize the ChronosBoltPipeline with a pretrained model.
+
+        Parameters
+        ----------
+        model
+            ChronosBoltModelForForecasting instance containing the pretrained
+            transformer model configured for quantile forecasting.
+        """
        super().__init__(inner_model=model)  # type: ignore
        self.model = model

@ -425,24 +449,31 @@ class ChronosBoltPipeline(BaseChronosPipeline):
        self, context: Union[torch.Tensor, List[torch.Tensor]]
    ) -> Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
        """
-        Get encoder embeddings for the given time series.
+        Extract encoder embeddings for the given time series.
+
+        This method processes the input time series through patching and instance
+        normalization, then extracts encoder embeddings that can be used for
+        downstream tasks like clustering, classification, or similarity search.

        Parameters
        ----------
        context
-            Input series. This is either a 1D tensor, or a list
-            of 1D tensors, or a 2D tensor whose first dimension
-            is batch. In the latter case, use left-padding with
-            ``torch.nan`` to align series of different lengths.
+            Input time series. Can be a 1D tensor (single series), a list
+            of 1D tensors (multiple series of varying lengths), or a 2D tensor
+            where the first dimension is batch size. For 2D tensors, use
+            left-padding with torch.nan to align series of different lengths.

        Returns
        -------
-        embeddings, loc_scale
-            A tuple of two items: the encoder embeddings and the loc_scale,
-            i.e., the mean and std of the original time series.
-            The encoder embeddings are shaped (batch_size, num_patches + 1, d_model),
-            where num_patches is the number of patches in the time series
-            and the extra 1 is for the [REG] token (if used by the model).
+        torch.Tensor
+            Encoder embeddings with shape (batch_size, num_patches + 1, d_model),
+            where num_patches is the number of patches created from the input
+            time series, and the extra 1 is for the [REG] token if used by the model.
+            Returned on CPU in the model's dtype.
+        Tuple[torch.Tensor, torch.Tensor]
+            Tuple of (location, scale) tensors used for instance normalization,
+            representing the mean and standard deviation of the original time series.
+            Both tensors have shape (batch_size,) and are returned on CPU.
        """
        context_tensor = self._prepare_and_validate_context(context=context)
        model_context_length = self.model.config.chronos_config["context_length"]
@ -467,31 +498,59 @@ class ChronosBoltPipeline(BaseChronosPipeline):
        limit_prediction_length: bool = False,
    ) -> torch.Tensor:
        """
-        Get forecasts for the given time series.
+        Generate quantile forecasts for the given time series.

-        Refer to the base method (``BaseChronosPipeline.predict``)
-        for details on shared parameters.
-        Additional parameters
-        ---------------------
+        This method directly predicts quantiles without generating sample trajectories.
+        For predictions longer than the model's built-in horizon, it uses an
+        autoregressive approach that expands the batch size by the number of quantiles
+        to generate more robust long-horizon forecasts.
+
+        Parameters
+        ----------
+        inputs
+            Input time series. Can be a 1D tensor (single series), a list
+            of 1D tensors (multiple series of varying lengths), or a 2D tensor
+            where the first dimension is batch size. For 2D tensors, use
+            left-padding with torch.nan to align series of different lengths.
+        prediction_length
+            Number of time steps to forecast. If not provided, uses the model's
+            default prediction length from the configuration.
        limit_prediction_length
-            Force prediction length smaller or equal than the
-            built-in prediction length from the model. False by
-            default. When true, fail loudly if longer predictions
-            are requested, otherwise longer predictions are allowed.
+            When True, raises an error if prediction_length exceeds the model's
+            built-in prediction length. When False (default), allows longer
+            predictions with a warning about potential quality degradation.

        Returns
        -------
        torch.Tensor
-            Forecasts of shape (batch_size, num_quantiles, prediction_length)
-            where num_quantiles is the number of quantiles the model has been
-            trained to output. For official Chronos-Bolt models, the value of
-            num_quantiles is 9 for [0.1, 0.2, ..., 0.9]-quantiles.
+            Quantile forecasts with shape (batch_size, num_quantiles, prediction_length),
+            where num_quantiles is the number of quantiles the model was trained on.
+            For official Chronos-Bolt models, num_quantiles is 9 for quantiles
+            [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9].
+            Returned in fp32 on CPU.

        Raises
        ------
        ValueError
-            When limit_prediction_length is True and the prediction_length is
-            greater than model's training prediction_length.
+            If limit_prediction_length is True and prediction_length exceeds
+            the model's built-in prediction length.
+
+        Notes
+        -----
+        For predictions longer than the model's built-in horizon, the method uses
+        an autoregressive approach:
+
+        1. Generate initial quantiles for the first chunk
+        2. Expand context by num_quantiles (treating each quantile as a scenario)
+        3. Generate next chunk for each scenario
+        4. Compute empirical quantiles across all scenarios
+        5. Repeat until desired prediction_length is reached
+
+        This approach scales the batch size by num_quantiles for long horizons,
+        which may require more GPU memory but produces more robust predictions.
+
+        If the input context is longer than the model's context length, it will
+        be automatically truncated to the most recent time steps.
        """
        context_tensor = self._prepare_and_validate_context(context=inputs)

@ -564,7 +623,57 @@ class ChronosBoltPipeline(BaseChronosPipeline):
        **predict_kwargs,
    ) -> Tuple[torch.Tensor, torch.Tensor]:
        """
-        Refer to the base method (``BaseChronosPipeline.predict_quantiles``).
+        Generate quantile and mean forecasts for given time series.
+
+        This method generates forecasts at the specified quantile levels. If the
+        requested quantiles match those the model was trained on, they are returned
+        directly. Otherwise, the method performs interpolation or extrapolation
+        to obtain the requested quantiles.
+
+        Parameters
+        ----------
+        inputs
+            Input time series. Can be a 1D tensor (single series), a list
+            of 1D tensors (multiple series of varying lengths), or a 2D tensor
+            where the first dimension is batch size. For 2D tensors, use
+            left-padding with torch.nan to align series of different lengths.
+        prediction_length
+            Number of time steps to forecast. If not provided, uses the model's
+            default prediction length from the configuration.
+        quantile_levels
+            List of quantile levels to compute, each between 0 and 1.
+            Default is [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9].
+        **predict_kwargs
+            Additional keyword arguments passed to the predict method, such as
+            limit_prediction_length.
+
+        Returns
+        -------
+        torch.Tensor
+            Tensor of quantile forecasts with shape
+            (batch_size, prediction_length, num_quantiles).
+            Returned in fp32 on CPU.
+        torch.Tensor
+            Tensor of mean forecasts with shape (batch_size, prediction_length).
+            This is actually the median (0.5 quantile) from the model's predictions.
+            Returned in fp32 on CPU.
+
+        Notes
+        -----
+        If the requested quantile_levels are a subset of the model's training
+        quantiles, they are extracted directly without interpolation.
+
+        If quantile_levels include values outside the range of training quantiles,
+        the method will extrapolate using the minimum/maximum training quantiles,
+        which may significantly affect prediction quality. A warning will be issued
+        in this case.
+
+        The interpolation/extrapolation assumes the model's training quantiles
+        formed an equidistant grid (e.g., 0.1, 0.2, ..., 0.9), which holds for
+        official Chronos-Bolt models but may not be true for custom models.
+
+        The mean returned is actually the median (0.5 quantile) from the model's
+        predictions, not a true mean.
        """
        # shape (batch_size, prediction_length, len(training_quantile_levels))
        predictions = (
@ -609,8 +718,49 @@ class ChronosBoltPipeline(BaseChronosPipeline):
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, *args, **kwargs):
        """
-        Load the model, either from a local path S3 prefix or from the HuggingFace Hub.
-        Supports the same arguments as ``AutoConfig`` and ``AutoModel`` from ``transformers``.
+        Load a pretrained ChronosBoltPipeline from various sources.
+
+        This method loads a pretrained ChronosBoltPipeline model from a local path,
+        S3 bucket, or the HuggingFace Hub. It automatically instantiates the
+        appropriate model architecture based on the configuration.
+
+        Parameters
+        ----------
+        pretrained_model_name_or_path
+            Path or identifier for the pretrained model. Can be:
+            - A local directory path containing model files
+            - An S3 URI (s3://bucket/prefix)
+            - A HuggingFace Hub model identifier (e.g., "amazon/chronos-bolt-small")
+        *args
+            Additional positional arguments passed to AutoConfig and the model constructor.
+        **kwargs
+            Additional keyword arguments passed to AutoConfig and the model constructor.
+            Common options include:
+            - torch_dtype: Data type for model weights ("auto", "float32", "bfloat16")
+            - device_map: Device placement strategy for model layers
+            - Other transformers AutoConfig and model arguments
+
+        Returns
+        -------
+        ChronosBoltPipeline
+            An instance of ChronosBoltPipeline with the loaded model.
+
+        Raises
+        ------
+        AssertionError
+            If the configuration is not a valid Chronos config.
+
+        Notes
+        -----
+        For S3 URIs, the method delegates to BaseChronosPipeline.from_pretrained
+        which handles S3 download and caching.
+
+        The method automatically detects the model architecture from the configuration
+        and instantiates the appropriate class. If the architecture is not recognized,
+        it defaults to ChronosBoltModelForForecasting.
+
+        This method supports all arguments accepted by HuggingFace's AutoConfig
+        and model classes.
        """

        if str(pretrained_model_name_or_path).startswith("s3://"):