diff --git a/docs/api/base.md b/docs/api/base.md
new file mode 100644
index 0000000..e59ca03
--- /dev/null
+++ b/docs/api/base.md
@@ -0,0 +1,16 @@
+# Base
+
+::: chronos.base.BaseChronosPipeline
+    options:
+      show_source: true
+      heading_level: 2
+      show_root_heading: true
+      show_root_toc_entry: true
+      members:
+        - predict
+        - predict_quantiles
+        - predict_df
+        - predict_fev
+        - from_pretrained
+        - model_context_length
+        - model_prediction_length
\ No newline at end of file
diff --git a/docs/api/chronos-2.md b/docs/api/chronos-2.md
new file mode 100644
index 0000000..7aa445f
--- /dev/null
+++ b/docs/api/chronos-2.md
@@ -0,0 +1,25 @@
+# Chronos
+
+::: chronos.chronos2.pipeline.Chronos2Pipeline
+    options:
+      show_source: true
+      heading_level: 2
+      show_root_heading: true
+      show_root_toc_entry: true
+      members: null
+      members:
+        - predict
+        - predict_quantiles
+        - predict_df
+        - predict_fev
+        - embed
+        - fit
+        - from_pretrained
+        - save_pretrained
+        - model_context_length
+        - model_prediction_length
+        - model_output_patch_size
+        - quantiles
+        - max_output_patches
+        - model
+        - forecast_type
\ No newline at end of file
diff --git a/docs/api/chronos-bolt.md b/docs/api/chronos-bolt.md
new file mode 100644
index 0000000..fd2cdc0
--- /dev/null
+++ b/docs/api/chronos-bolt.md
@@ -0,0 +1,18 @@
+# Chronos
+
+::: chronos.chronos_bolt.ChronosBoltPipeline
+    options:
+      show_source: true
+      heading_level: 2
+      show_root_heading: true
+      show_root_toc_entry: true
+      members: null
+      members:
+        - predict
+        - predict_quantiles
+        - embed
+        - from_pretrained
+        - model_context_length
+        - model_prediction_length
+        - model
+        - forecast_type
\ No newline at end of file
diff --git a/docs/api/chronos.md b/docs/api/chronos.md
new file mode 100644
index 0000000..504de11
--- /dev/null
+++ b/docs/api/chronos.md
@@ -0,0 +1,18 @@
+# Chronos
+
+::: chronos.chronos.ChronosPipeline
+    options:
+      show_source: true
+      heading_level: 2
+      show_root_heading: true
+      show_root_toc_entry: true
+      members:
+        - predict
+        - predict_quantiles
+        - embed
+        - from_pretrained
+        - model_context_length
+        - model_prediction_length
+        - tokenizer
+        - model
+        - forecast_type
\ No newline at end of file
diff --git a/docs/index.md b/docs/index.md
new file mode 100644
index 0000000..860a2bd
--- /dev/null
+++ b/docs/index.md
@@ -0,0 +1,89 @@
+# chronos-forecasting
+
+
+## Introduction
+
+This package provides an interface to the Chronos family of **pretrained time series forecasting models**. The following model types are supported.
+
+- **Chronos-2**: Our latest model with significantly enhanced capabilities. It offers zero-shot support for univariate, multivariate, and covariate-informed forecasting tasks. Chronos-2 delivers state-of-the-art zero-shot performance across multiple benchmarks (including fev-bench and GIFT-Eval), with the largest improvements observed on tasks that include exogenous features. It also achieves a win rate of over 90% against Chronos-Bolt in head-to-head comparisons. To learn more about Chronos, check out the [technical report](https://arxiv.org/abs/2510.15821).
+- **Chronos-Bolt**: A patch-based variant of Chronos. It chunks the historical time series context into patches of multiple observations, which are then input into the encoder. The decoder then uses these representations to directly generate quantile forecasts across multiple future steps—a method known as direct multi-step forecasting. Chronos-Bolt models are up to 250 times faster and 20 times more memory-efficient than the original Chronos models of the same size. To learn more about Chronos-Bolt, check out this [blog post](https://aws.amazon.com/blogs/machine-learning/fast-and-accurate-zero-shot-forecasting-with-chronos-bolt-and-autogluon/).
+- **Chronos**: The original Chronos family which is based on language model architectures. A time series is transformed into a sequence of tokens via scaling and quantization, and a language model is trained on these tokens using the cross-entropy loss. Once trained, probabilistic forecasts are obtained by sampling multiple future trajectories given the historical context. To learn more about Chronos, check out the [publication](https://openreview.net/forum?id=gerNCVqqtR).
+
+### Available Models
+
+| Model ID                                                               | Parameters |
+| ---------------------------------------------------------------------- | ---------- |
+| [`amazon/chronos-2`](https://huggingface.co/amazon/chronos-2)   | 120M         |
+| [`amazon/chronos-bolt-tiny`](https://huggingface.co/amazon/chronos-bolt-tiny)   | 9M         |
+| [`amazon/chronos-bolt-mini`](https://huggingface.co/amazon/chronos-bolt-mini)   | 21M        |
+| [`amazon/chronos-bolt-small`](https://huggingface.co/amazon/chronos-bolt-small) | 48M        |
+| [`amazon/chronos-bolt-base`](https://huggingface.co/amazon/chronos-bolt-base)   | 205M       |
+| [`amazon/chronos-t5-tiny`](https://huggingface.co/amazon/chronos-t5-tiny)   | 8M         |
+| [`amazon/chronos-t5-mini`](https://huggingface.co/amazon/chronos-t5-mini)   | 20M        |
+| [`amazon/chronos-t5-small`](https://huggingface.co/amazon/chronos-t5-small) | 46M        |
+| [`amazon/chronos-t5-base`](https://huggingface.co/amazon/chronos-t5-base)   | 200M       |
+| [`amazon/chronos-t5-large`](https://huggingface.co/amazon/chronos-t5-large) | 710M       |
+
+
+
+
+## Installation
+
+```bash
+pip install chronos-forecasting
+```
+
+## Quickstart
+
+A minimal example showing how to perform forecasting using Chronos-2:
+
+```py
+import pandas as pd  # requires: pip install 'pandas[pyarrow]'
+from chronos import Chronos2Pipeline
+
+pipeline = Chronos2Pipeline.from_pretrained("amazon/chronos-2", device_map="cuda")
+
+# Load historical target values and past values of covariates
+context_df = pd.read_parquet("https://autogluon.s3.amazonaws.com/datasets/timeseries/electricity_price/train.parquet")
+
+# (Optional) Load future values of covariates
+test_df = pd.read_parquet("https://autogluon.s3.amazonaws.com/datasets/timeseries/electricity_price/test.parquet")
+future_df = test_df.drop(columns="target")
+
+# Generate predictions with covariates
+pred_df = pipeline.predict_df(
+    context_df,
+    future_df=future_df,
+    prediction_length=24,  # Number of steps to forecast
+    quantile_levels=[0.1, 0.5, 0.9],  # Quantile for probabilistic forecast
+    id_column="id",  # Column identifying different time series
+    timestamp_column="timestamp",  # Column with datetime information
+    target="target",  # Column(s) with time series values to predict
+)
+```
+
+We can now visualize the forecast:
+
+```py
+import matplotlib.pyplot as plt  # requires: pip install matplotlib
+
+ts_context = context_df.set_index("timestamp")["target"].tail(256)
+ts_pred = pred_df.set_index("timestamp")
+ts_ground_truth = test_df.set_index("timestamp")["target"]
+
+ts_context.plot(label="historical data", color="xkcd:azure", figsize=(12, 3))
+ts_ground_truth.plot(label="future data (ground truth)", color="xkcd:grass green")
+ts_pred["predictions"].plot(label="forecast", color="xkcd:violet")
+plt.fill_between(
+    ts_pred.index,
+    ts_pred["0.1"],
+    ts_pred["0.9"],
+    alpha=0.7,
+    label="prediction interval",
+    color="xkcd:light lavender",
+)
+plt.legend()
+```
+
+## Tutorials
+
diff --git a/mkdocs.yml b/mkdocs.yml
new file mode 100644
index 0000000..40d5ce0
--- /dev/null
+++ b/mkdocs.yml
@@ -0,0 +1,40 @@
+site_name: chronos-forecasting
+site_description: A library for the Chronos time series foundation models
+site_url: https://amazon-science.github.io/chronos-forecasting/
+repo_url: https://github.com/amazon-science/chronos-forecasting
+repo_name: amazon-science/chronos-forecasting
+
+theme:
+  name: material
+  palette:
+    primary: blue
+
+plugins:
+  - search
+  - mkdocstrings:
+      handlers:
+        python:
+          options:
+            docstring_style: numpy
+            show_root_heading: true
+            show_root_full_path: false
+            show_signature_annotations: true
+            show_category_heading: true
+            group_by_category: true
+            allow_inspection: true
+
+nav:
+  - Home: index.md
+  # - Tutorials:
+  #   - Quickstart: 
+  - API Reference:
+    - Base: api/base.md
+    - Chronos: api/chronos.md
+    - Chronos-Bolt: api/chronos-bolt.md
+    - Chronos-2: api/chronos-2.md
+
+markdown_extensions:
+  - pymdownx.highlight
+  - pymdownx.superfences
+  - attr_list
+  - md_in_html
\ No newline at end of file
diff --git a/src/chronos/base.py b/src/chronos/base.py
index 533cce5..52c704a 100644
--- a/src/chronos/base.py
+++ b/src/chronos/base.py
@@ -44,38 +44,26 @@ class PipelineRegistry(type):
 class BaseChronosPipeline(metaclass=PipelineRegistry):
     """
     Abstract base class for Chronos pretrained time series forecasting pipelines.
-    
+
     This class defines the common interface for all Chronos models. The package provides
     multiple pipeline implementations with different forecasting approaches and architectures:
-    
-    - ChronosPipeline: Sample-based forecasting with scaling and quantization based tokenization
-    - ChronosBoltPipeline: Quantile-based forecasting with patching
-    - Chronos2Pipeline (recommended): Quantile-based forecasting with support for multivariate and covariate-informed forecasting
-    
+
+    - [ChronosPipeline][chronos.chronos.ChronosPipeline]: Sample-based forecasting with scaling and quantization based tokenization
+    - [ChronosBoltPipeline][chronos.chronos_bolt.ChronosBoltPipeline]: Quantile-based forecasting with patching
+    - [Chronos2Pipeline][chronos.chronos2.pipeline.Chronos2Pipeline] (recommended): Quantile-based forecasting with support for multivariate and covariate-informed forecasting
+
     Each subclass implements the abstract methods and properties defined here,
     potentially with different parameter signatures and return types depending
     on the model architecture and forecasting approach.
-    
-    Attributes
-    ----------
-    forecast_type
-        Enum indicating whether the pipeline produces samples or quantiles
-    inner_model
-        The underlying HuggingFace transformers model
-    
-    See Also
-    --------
-    ChronosPipeline: Sample-based forecasting with scaling and quantization based tokenization
-    ChronosBoltPipeline: Quantile-based forecasting with patching
-    Chronos2Pipeline (recommended): Quantile-based forecasting with support for multivariate and covariate-informed forecasting
     """
+
     forecast_type: ForecastType
     dtypes = {"bfloat16": torch.bfloat16, "float32": torch.float32}
 
     def __init__(self, inner_model: "PreTrainedModel"):
         """
         Initialize the base pipeline with a pretrained model.
-        
+
         Parameters
         ----------
         inner_model
@@ -89,18 +77,13 @@ class BaseChronosPipeline(metaclass=PipelineRegistry):
     def model_context_length(self) -> int:
         """
         Maximum number of time steps the model can use as context.
-        
+
         This is an abstract property that must be implemented by subclasses.
-        
+
         Returns
         -------
         int
             Maximum context length supported by the model
-        
-        Notes
-        -----
-        Subclasses must implement this property based on their specific
-        model architecture and configuration.
         """
         raise NotImplementedError()
 
@@ -108,17 +91,13 @@ class BaseChronosPipeline(metaclass=PipelineRegistry):
     def model_prediction_length(self) -> int:
         """
         Default prediction horizon for the model.
-        
+
         This is an abstract property that must be implemented by subclasses.
-        
+
         Returns
         -------
         int
             Default prediction horizon
-        
-        Notes
-        -----
-        Subclasses must implement this property based on their specific model architecture and configuration.
         """
         raise NotImplementedError()
 
@@ -135,12 +114,12 @@ class BaseChronosPipeline(metaclass=PipelineRegistry):
     def predict(self, inputs: Union[torch.Tensor, List[torch.Tensor]], prediction_length: Optional[int] = None):
         """
         Generate forecasts for the given time series.
-        
+
         This is an abstract method that must be implemented by subclasses.
         Each subclass may have different parameters and return types depending
         on the model architecture and forecasting approach. Predictions are
         typically returned in fp32 on the CPU.
-        
+
         Parameters
         ----------
         inputs
@@ -151,13 +130,13 @@ class BaseChronosPipeline(metaclass=PipelineRegistry):
         prediction_length
             Number of time steps to forecast. If not provided, defaults to
             the model's default prediction length.
-        
+
         Returns
         -------
         torch.Tensor
             Forecasts tensor. The shape and interpretation depend on the
             subclass's forecast_type (samples or quantiles).
-        
+
         Notes
         -----
         Subclasses may extend this interface with additional parameters
@@ -175,11 +154,11 @@ class BaseChronosPipeline(metaclass=PipelineRegistry):
     ) -> Tuple[torch.Tensor, torch.Tensor]:
         """
         Generate quantile and mean forecasts for given time series.
-        
+
         This is an abstract method that must be implemented by subclasses.
         Each subclass may have different parameters depending on the model
         architecture. Predictions are typically returned in fp32 on the CPU.
-        
+
         Parameters
         ----------
         inputs
@@ -195,16 +174,16 @@ class BaseChronosPipeline(metaclass=PipelineRegistry):
             Default is [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9].
         **kwargs
             Additional keyword arguments that may be used by subclass implementations.
-        
+
         Returns
         -------
-        quantiles
+        torch.Tensor
             Tensor of quantile forecasts with shape
             (batch_size, prediction_length, num_quantiles)
-        mean
+        torch.Tensor
             Tensor of mean (point) forecasts with shape
             (batch_size, prediction_length)
-        
+
         Notes
         -----
         Subclasses may extend this interface with additional parameters
@@ -227,11 +206,11 @@ class BaseChronosPipeline(metaclass=PipelineRegistry):
     ) -> "pd.DataFrame":
         """
         Generate forecasts for time series data in a pandas DataFrame.
-        
+
         This method provides a convenient interface for forecasting on long-format
         pandas DataFrames containing multiple time series. It handles data conversion,
         batching, and result formatting automatically.
-        
+
         Parameters
         ----------
         df
@@ -258,26 +237,27 @@ class BaseChronosPipeline(metaclass=PipelineRegistry):
         -------
         pd.DataFrame
             Forecast results in long format with the following columns:
+
             - Column named by id_column: Time series identifiers
             - Column named by timestamp_column: Future timestamps
             - "target_name": Name of the forecasted target variable
             - "predictions": Point forecasts (mean predictions)
             - One column per quantile level (e.g., "0.1", "0.5", "0.9")
-        
+
         Raises
         ------
         ImportError
             If pandas is not installed.
         ValueError
             If target is not a string (multivariate forecasting not supported).
-        
+
         Notes
         -----
         This method requires pandas to be installed. Install with `pip install pandas`.
-        
+
         The method internally converts the DataFrame to tensor format, generates
         forecasts using predict_quantiles, and converts results back to DataFrame format.
-        
+
         Subclasses may have additional parameters or behavior. Refer to specific
         subclass documentation for implementation details.
         """
@@ -349,11 +329,11 @@ class BaseChronosPipeline(metaclass=PipelineRegistry):
     ) -> tuple[list["datasets.DatasetDict"], float]:
         """
         Generate predictions for evaluation on a fev benchmark task.
-        
+
         This method provides integration with the fev (Forecasting Evaluation)
         library for standardized benchmark evaluation. It handles batching,
         timing, and formatting predictions according to the task requirements.
-        
+
         Parameters
         ----------
         task
@@ -366,21 +346,21 @@ class BaseChronosPipeline(metaclass=PipelineRegistry):
         **kwargs
             Additional keyword arguments forwarded to the predict_quantiles method.
             These may include model-specific parameters.
-        
+
         Returns
         -------
-        predictions_per_window
+        list[DatasetDict]
             List of DatasetDict objects, one for each evaluation window in the task.
             Each DatasetDict contains predictions formatted according to fev requirements.
-        inference_time_s
+        float
             Total inference time in seconds across all windows, excluding data
             loading and preprocessing time.
-        
+
         Raises
         ------
         ImportError
             If the fev library is not installed.
-        
+
         Notes
         -----
         This method requires the fev library to be installed. Install with
@@ -458,11 +438,11 @@ class BaseChronosPipeline(metaclass=PipelineRegistry):
     ):
         """
         Load a pretrained Chronos pipeline from various sources.
-        
+
         This class method loads a pretrained model from a local path, S3 bucket,
         or the HuggingFace Hub. It automatically detects the appropriate pipeline
         class based on the model configuration and instantiates it.
-        
+
         Parameters
         ----------
         pretrained_model_name_or_path
@@ -481,13 +461,13 @@ class BaseChronosPipeline(metaclass=PipelineRegistry):
             - torch_dtype: Data type for model weights ("auto", "float32", "bfloat16")
             - device_map: Device placement strategy for model layers
             - Other transformers AutoConfig and AutoModel arguments
-        
+
         Returns
         -------
         BaseChronosPipeline
             An instance of the appropriate pipeline subclass (ChronosPipeline,
             ChronosBoltPipeline, or Chronos2Pipeline) based on the model configuration.
-        
+
         Raises
         ------
         ValueError
@@ -495,16 +475,16 @@ class BaseChronosPipeline(metaclass=PipelineRegistry):
             specified pipeline class is not recognized.
         ImportError
             If required dependencies are not installed.
-        
+
         Notes
         -----
         The method reads the model configuration to determine which pipeline
         class to instantiate. The configuration must contain either a
         `chronos_pipeline_class` or `chronos_config` attribute.
-        
+
         For S3 URIs, the model is first downloaded to a local cache directory
         before loading.
-        
+
         The torch_dtype parameter can be specified as a string ("float32", "bfloat16")
         or as a torch dtype object. When set to "auto", the dtype is determined
         from the model configuration.
diff --git a/src/chronos/chronos.py b/src/chronos/chronos.py
index b3c08cf..396935b 100644
--- a/src/chronos/chronos.py
+++ b/src/chronos/chronos.py
@@ -152,6 +152,19 @@ class ChronosTokenizer:
 
 
 class MeanScaleUniformBins(ChronosTokenizer):
+    """
+    A tokenizer which first applies mean scaling and then quantizes the scaled values in uniformly-spaced bins.
+
+    Parameters
+    ----------
+    low_limit
+        The lower limit of quantization. (Scaled) Values smaller than this will be clipped.
+    high_limit
+        The upper limit of quantization. (Scaled) Values larger than this will be clipped.
+    config
+        The ``ChronosConfig``
+    """
+
     def __init__(self, low_limit: float, high_limit: float, config: ChronosConfig) -> None:
         self.config = config
         self.centers = torch.linspace(
@@ -356,6 +369,13 @@ class ChronosPipeline(BaseChronosPipeline):
     """
     Pipeline for the Chronos model.
     
+    To learn more about this model, refer to:
+
+    Ansari, Abdul Fatir, Stella, Lorenzo et al.
+    "[Chronos: Learning the Language of Time Series](https://arxiv.org/abs/2403.07815)."
+    Transactions on Machine Learning Research (2024).
+
+
     Parameters
     ----------
     tokenizer
@@ -363,20 +383,6 @@ class ChronosPipeline(BaseChronosPipeline):
         values and discrete tokens.
     model
         ChronosModel instance wrapping the underlying transformer model.
-    
-    Attributes
-    ----------
-    tokenizer
-        The tokenizer used for encoding/decoding time series
-    model
-        The model used for generating forecasts
-    forecast_type
-        Set to ForecastType.SAMPLES indicating this pipeline produces samples
-    
-    See Also
-    --------
-    ChronosBoltPipeline: Quantile-based forecasting with patching
-    Chronos2Pipeline: Quantile-based forecasting with support for multivariate and covariate-informed forecasting
     """
 
     tokenizer: ChronosTokenizer
@@ -386,7 +392,7 @@ class ChronosPipeline(BaseChronosPipeline):
     def __init__(self, tokenizer, model):
         """
         Initialize the ChronosPipeline with a tokenizer and model.
-        
+
         Parameters
         ----------
         tokenizer
@@ -421,12 +427,12 @@ class ChronosPipeline(BaseChronosPipeline):
     def embed(self, context: Union[torch.Tensor, List[torch.Tensor]]) -> Tuple[torch.Tensor, Any]:
         """
         Extract encoder embeddings for the given time series.
-        
+
         This method tokenizes the input time series and extracts the encoder
         embeddings, which can be used for downstream tasks like clustering,
         classification, or similarity search. Only available for encoder-decoder
         (seq2seq) models.
-        
+
         Parameters
         ----------
         context
@@ -434,24 +440,24 @@ class ChronosPipeline(BaseChronosPipeline):
             of 1D tensors (multiple series of varying lengths), or a 2D tensor
             where the first dimension is batch size. For 2D tensors, use
             left-padding with torch.nan to align series of different lengths.
-        
+
         Returns
         -------
-        embeddings
+        torch.Tensor
             Encoder embeddings with shape (batch_size, context_length, d_model)
             or (batch_size, context_length + 1, d_model) if EOS token is used.
             The context_length is either the time dimension of the input 2D tensor
             or the length of the longest series in the input list.
-        tokenizer_state
+        Any
             Tokenizer state containing scaling information (e.g., mean scale)
             used during tokenization. Can be used for consistent processing
             of related time series.
-        
+
         Notes
         -----
         This method is only supported for encoder-decoder (seq2seq) models.
         Decoder-only (causal) models do not have a separate encoder.
-        
+
         The embeddings are returned on CPU in fp32 format.
         """
         context_tensor = self._prepare_and_validate_context(context=context)
@@ -474,12 +480,12 @@ class ChronosPipeline(BaseChronosPipeline):
     ) -> torch.Tensor:
         """
         Generate sample-based forecasts for the given time series.
-        
+
         This method tokenizes the input time series, generates multiple sample
         trajectories using the transformer model, and decodes them back to real
         values. For predictions longer than the model's built-in horizon, it uses
         autoregressive generation by feeding back the median of generated samples.
-        
+
         Parameters
         ----------
         inputs
@@ -507,27 +513,28 @@ class ChronosPipeline(BaseChronosPipeline):
             When True, raises an error if prediction_length exceeds the model's
             built-in prediction length. When False (default), allows longer
             predictions with a warning about potential quality degradation.
-        
+
         Returns
         -------
         torch.Tensor
             Sample forecasts with shape (batch_size, num_samples, prediction_length).
             Returned in fp32 on CPU.
-        
+
         Raises
         ------
         ValueError
             If limit_prediction_length is True and prediction_length exceeds
             the model's built-in prediction length.
-        
+
         Notes
         -----
         For predictions longer than the model's built-in horizon, the method
         uses autoregressive generation by iteratively:
+
         1. Generating samples for the next chunk
         2. Taking the median across samples
         3. Appending it to the context for the next iteration
-        
+
         This autoregressive approach may lead to quality degradation for very
         long horizons, as the model was not explicitly trained for this.
         """
@@ -581,12 +588,12 @@ class ChronosPipeline(BaseChronosPipeline):
     ) -> Tuple[torch.Tensor, torch.Tensor]:
         """
         Generate quantile and mean forecasts from sample trajectories.
-        
+
         This method first generates multiple sample trajectories using the predict
         method, then computes empirical quantiles and mean from these samples.
         This provides a convenient interface for obtaining quantile forecasts from
         the model.
-        
+
         Parameters
         ----------
         inputs
@@ -603,26 +610,26 @@ class ChronosPipeline(BaseChronosPipeline):
         **predict_kwargs
             Additional keyword arguments passed to the predict method, such as
             num_samples, temperature, top_k, top_p, and limit_prediction_length.
-        
+
         Returns
         -------
-        quantiles
+        torch.Tensor
             Tensor of quantile forecasts with shape
             (batch_size, prediction_length, num_quantiles).
             Returned in fp32 on CPU.
-        mean
+        torch.Tensor
             Tensor of mean forecasts with shape
             (batch_size, prediction_length).
             Returned in fp32 on CPU.
-        
+
         Notes
         -----
         The quantiles are computed empirically from the generated samples.
         The accuracy of quantile estimates depends on the number of samples
         generated (controlled by num_samples parameter in predict_kwargs).
-        
+
         For better quantile estimates, consider increasing num_samples, though
-        this will increase computation time proportionally.
+        this will increase memory usage and computation time.
         """
         prediction_samples = (
             self.predict(inputs, prediction_length=prediction_length, **predict_kwargs).detach().swapaxes(1, 2)
@@ -640,11 +647,11 @@ class ChronosPipeline(BaseChronosPipeline):
     def from_pretrained(cls, pretrained_model_name_or_path, *args, **kwargs):
         """
         Load a pretrained ChronosPipeline from various sources.
-        
+
         This method loads a pretrained ChronosPipeline model from a local path,
         S3 bucket, or the HuggingFace Hub. It automatically instantiates the
         appropriate tokenizer and model based on the configuration.
-        
+
         Parameters
         ----------
         pretrained_model_name_or_path
@@ -660,26 +667,26 @@ class ChronosPipeline(BaseChronosPipeline):
             - torch_dtype: Data type for model weights ("auto", "float32", "bfloat16")
             - device_map: Device placement strategy for model layers
             - Other transformers AutoConfig and AutoModel arguments
-        
+
         Returns
         -------
         ChronosPipeline
             An instance of ChronosPipeline with loaded tokenizer and model.
-        
+
         Raises
         ------
         AssertionError
             If the configuration is not a valid Chronos config.
-        
+
         Notes
         -----
         For S3 URIs, the method delegates to BaseChronosPipeline.from_pretrained
         which handles S3 download and caching.
-        
+
         The method automatically detects whether to load a seq2seq or causal
         model based on the configuration and instantiates the appropriate
         model class.
-        
+
         This method supports all arguments accepted by HuggingFace's AutoConfig
         and AutoModel classes.
         """
diff --git a/src/chronos/chronos2/pipeline.py b/src/chronos/chronos2/pipeline.py
index 46f4fda..65a2427 100644
--- a/src/chronos/chronos2/pipeline.py
+++ b/src/chronos/chronos2/pipeline.py
@@ -39,19 +39,20 @@ logger = logging.getLogger(__name__)
 class Chronos2Pipeline(BaseChronosPipeline):
     """
     Pipeline for the Chronos-2 model.
-    
-    See Also
-    --------
-    ChronosPipeline: Sample-based forecasting with scaling and quantization based tokenization
-    ChronosBoltPipeline: Quantile-based forecasting with patching
+
+    To learn more about this model, refer to:
+
+    Ansari, Abdul Fatir, Shchur, Oleksandr, Küken, Jaris et al.
+    "[Chronos-2: From Univariate to Universal Forecasting](https://arxiv.org/abs/2510.15821)."
+
     """
+
     forecast_type: ForecastType = ForecastType.QUANTILES
-    default_context_length: int = 2048
 
     def __init__(self, model: Chronos2Model):
         """
         Initialize the Chronos-2 pipeline with a pretrained model.
-        
+
         Parameters
         ----------
         model
@@ -92,7 +93,7 @@ class Chronos2Pipeline(BaseChronosPipeline):
     def model_context_length(self) -> int:
         """
         Maximum number of time steps the model can use as context.
-        
+
         Returns
         -------
         Maximum context length supported by the model
@@ -103,7 +104,7 @@ class Chronos2Pipeline(BaseChronosPipeline):
     def model_output_patch_size(self) -> int:
         """
         Size of each output patch produced by the model.
-        
+
         Returns
         -------
         Output patch size
@@ -114,7 +115,7 @@ class Chronos2Pipeline(BaseChronosPipeline):
     def model_prediction_length(self) -> int:
         """
         Default prediction horizon for the model.
-        
+
         Returns
         -------
         Default prediction horizon (max_output_patches * output_patch_size)
@@ -125,7 +126,7 @@ class Chronos2Pipeline(BaseChronosPipeline):
     def quantiles(self) -> list[float]:
         """
         Quantile levels the model was trained to predict.
-        
+
         Returns
         -------
         List of quantile levels
@@ -136,7 +137,7 @@ class Chronos2Pipeline(BaseChronosPipeline):
     def max_output_patches(self) -> int:
         """
         Maximum number of output patches the model can generate in a single forward pass.
-        
+
         Returns
         -------
         Maximum number of output patches
@@ -216,7 +217,9 @@ class Chronos2Pipeline(BaseChronosPipeline):
 
         Returns
         -------
-        A new `Chronos2Pipeline` with the fine-tuned model
+
+        Chronos2Pipeline
+            A new `Chronos2Pipeline` with the fine-tuned model
         """
 
         import torch.cuda
@@ -603,8 +606,10 @@ class Chronos2Pipeline(BaseChronosPipeline):
 
         Returns
         -------
-        The model's predictions, a list of `torch.Tensor` where each element has shape (n_variates, n_quantiles, prediction_length) and the number of
-        elements are equal to the number of target time series (univariate or multivariate) in the `inputs`.
+
+        list[torch.Tensor]
+            The model's predictions, a list of `torch.Tensor` where each element has shape (n_variates, n_quantiles, prediction_length) and the number of
+            elements are equal to the number of target time series (univariate or multivariate) in the `inputs`.
 
         """
         model_prediction_length = self.model_prediction_length
@@ -815,20 +820,20 @@ class Chronos2Pipeline(BaseChronosPipeline):
     ) -> tuple[list[torch.Tensor], list[torch.Tensor]]:
         """
         Generate quantile and mean forecasts for given time series.
-        
-        Refer to ``Chronos2Pipeline.predict`` for shared parameters.
 
-        Additional Parameters
+        Refer to `Chronos2Pipeline.predict` for shared parameters.
+
+        Parameters
         ---------------------
         quantile_levels
             Quantile levels to compute, by default [0.1, 0.2, ..., 0.9]
 
         Returns
         -------
-        quantiles
+        list[torch.Tensor]
             A list of torch tensors containing quantile forecasts. Each element has shape (n_variates, prediction_length, len(quantile_levels))
             and the number of elements equals the number of target time series (univariate or multivariate) in the inputs.
-        mean
+        list[torch.Tensor]
             A list of torch tensors containing mean (point) forecasts. Each element has shape (n_variates, prediction_length)
             and the number of elements equals the number of target time series (univariate or multivariate) in the inputs.
         """
@@ -894,7 +899,7 @@ class Chronos2Pipeline(BaseChronosPipeline):
         id_column
             The name of the column which contains the unique time series identifiers
         timestamp_column
-            The name of the column which contains timestamps. All time series in the dataframe must have 
+            The name of the column which contains timestamps. All time series in the dataframe must have
             regular timestamps with the same frequency (no gaps)
         target
             The name of the column(s) which contain the target variables to be forecasted
@@ -903,19 +908,19 @@ class Chronos2Pipeline(BaseChronosPipeline):
         quantile_levels
             Quantile levels to compute
         batch_size
-            The batch size used for prediction. Note that the batch size here means the number of time series, 
-            including target(s) and covariates, which are input into the model. If your data has multiple target 
+            The batch size used for prediction. Note that the batch size here means the number of time series,
+            including target(s) and covariates, which are input into the model. If your data has multiple target
             and/or covariates, the effective number of time series tasks in a batch will be lower than this value
         context_length
             The maximum context length used during inference, by default set to the model's default context length
         cross_learning
-            If True, cross-learning is enabled, i.e., all the tasks in inputs will be predicted jointly and the 
+            If True, cross-learning is enabled, i.e., all the tasks in inputs will be predicted jointly and the
             model will share information across all inputs. The following must be noted when using cross-learning:
             - Cross-learning doesn't always improve forecast accuracy and must be tested for individual use cases.
-            - Results become dependent on batch size. Very large batch sizes may not provide benefits as they 
-            deviate from the maximum group size used during pretraining. For optimal results, consider using a 
+            - Results become dependent on batch size. Very large batch sizes may not provide benefits as they
+            deviate from the maximum group size used during pretraining. For optimal results, consider using a
             batch size around 100 (as used in the Chronos-2 technical report).
-            - Cross-learning is most helpful when individual time series have limited historical context, as the 
+            - Cross-learning is most helpful when individual time series have limited historical context, as the
             model can leverage patterns from related series in the batch.
         validate_inputs
             When True, the dataframe(s) will be validated before prediction, ensuring that timestamps have a
@@ -925,12 +930,15 @@ class Chronos2Pipeline(BaseChronosPipeline):
 
         Returns
         -------
-        The forecasts dataframe generated by the model with the following columns:
-        - id_column: The time series ID
-        - timestamp_column: Future timestamps
-        - "target_name": The name of the target column
-        - "predictions": The point predictions generated by the model
-        - One column for predictions at each quantile level in quantile_levels
+
+        pd.DataFrame
+            The forecasts dataframe generated by the model with the following columns:
+
+            - id_column: The time series ID
+            - timestamp_column: Future timestamps
+            - "target_name": The name of the target column
+            - "predictions": The point predictions generated by the model
+            - One column for predictions at each quantile level in quantile_levels
         """
         try:
             import pandas as pd
@@ -1091,9 +1099,9 @@ class Chronos2Pipeline(BaseChronosPipeline):
 
         Returns
         -------
-        predictions
+        list[DatasetDict]
             Predictions for each window, each stored as a DatasetDict
-        inference_time_s
+        float
             Total time that it took to make predictions for all windows (in seconds)
         """
         from chronos.chronos2.dataset import convert_fev_window_to_list_of_dicts_input
@@ -1153,25 +1161,25 @@ class Chronos2Pipeline(BaseChronosPipeline):
         ----------
         inputs
             The time series to get embeddings for, can be one of:
-            - A 3-dimensional torch.Tensor or np.ndarray of shape (batch, n_variates, history_length). When n_variates > 1, 
+            - A 3-dimensional torch.Tensor or np.ndarray of shape (batch, n_variates, history_length). When n_variates > 1,
             information will be shared among the different variates of each time series in the batch.
             - A list of torch.Tensor or np.ndarray where each element can either be 1-dimensional of shape (history_length,)
-            or 2-dimensional of shape (n_variates, history_length). The history_lengths may be different across elements; 
+            or 2-dimensional of shape (n_variates, history_length). The history_lengths may be different across elements;
             left-padding will be applied, if needed.
         batch_size
-            The batch size used for generating embeddings. Note that the batch size here means the total number of time series 
-            which are input into the model. If your data has multiple variates, the effective number of time series tasks in a 
+            The batch size used for generating embeddings. Note that the batch size here means the total number of time series
+            which are input into the model. If your data has multiple variates, the effective number of time series tasks in a
             batch will be lower than this value
         context_length
             The maximum context length used during inference, by default set to the model's default context length
 
         Returns
         -------
-        embeddings
-            A list of torch.Tensor where each element has shape (n_variates, num_patches + 2, d_model) and the number of 
-            elements equals the number of target time series (univariate or multivariate) in the inputs. The extra +2 is due 
+        list[torch.Tensor]
+            A list of torch.Tensor where each element has shape (n_variates, num_patches + 2, d_model) and the number of
+            elements equals the number of target time series (univariate or multivariate) in the inputs. The extra +2 is due
             to embeddings of the [REG] token and a masked output patch token.
-        loc_scale
+        list[tuple[torch.Tensor, torch.Tensor]]
             A list of tuples with the mean and standard deviation of each time series.
         """
         if context_length is None:
@@ -1228,10 +1236,10 @@ class Chronos2Pipeline(BaseChronosPipeline):
     def from_pretrained(cls, pretrained_model_name_or_path, *args, **kwargs):
         """
         Load the model from a local path, S3 prefix, or HuggingFace Hub.
-        
+
         Supports loading base models and LoRA adapters. When loading a LoRA adapter,
         it will be automatically merged with the base model.
-        
+
         Parameters
         ----------
         pretrained_model_name_or_path
@@ -1243,11 +1251,11 @@ class Chronos2Pipeline(BaseChronosPipeline):
             Additional positional arguments passed to AutoConfig and AutoModel
         **kwargs
             Additional keyword arguments passed to AutoConfig and AutoModel
-        
+
         Returns
         -------
         A Chronos2Pipeline instance with the loaded model
-        
+
         Notes
         -----
         Supports the same arguments as AutoConfig and AutoModel from transformers.
@@ -1289,7 +1297,7 @@ class Chronos2Pipeline(BaseChronosPipeline):
     def save_pretrained(self, save_directory: str | Path, *args, **kwargs):
         """
         Save the underlying model to a local directory or HuggingFace Hub.
-        
+
         Parameters
         ----------
         save_directory
diff --git a/src/chronos/chronos_bolt.py b/src/chronos/chronos_bolt.py
index 68fcaa8..fe5fa17 100644
--- a/src/chronos/chronos_bolt.py
+++ b/src/chronos/chronos_bolt.py
@@ -403,33 +403,26 @@ class ChronosBoltModelForForecasting(T5PreTrainedModel):
 class ChronosBoltPipeline(BaseChronosPipeline):
     """
     Pipeline for the Chronos-Bolt model.
-    
+
+    To learn more about this model, refer to:
+
+    Abdul Fatir Ansari, Caner Turkmen, Oleksandr Shchur, and Lorenzo Stella
+    "[Fast and accurate zero-shot forecasting with Chronos-Bolt and AutoGluon](https://aws.amazon.com/blogs/machine-learning/fast-and-accurate-zero-shot-forecasting-with-chronos-bolt-and-autogluon/)."
+    AWS Blogs (2024).
+
     Parameters
     ----------
     model
-        ChronosBoltModelForForecasting instance containing the pretrained model.
-    
-    Attributes
-    ----------
-    model
-        The underlying forecasting model
-    forecast_type
-        Set to ForecastType.QUANTILES indicating this pipeline produces quantiles
-    default_context_length
-        Default context length of 2048 time steps
-    
-    See Also
-    --------
-    ChronosPipeline : Sample-based forecasting with tokenization
-    Chronos2Pipeline : Advanced forecasting with covariates support
+        `ChronosBoltModelForForecasting` instance containing the pretrained model.
     """
+
     forecast_type: ForecastType = ForecastType.QUANTILES
     default_context_length: int = 2048
 
     def __init__(self, model: ChronosBoltModelForForecasting):
         """
         Initialize the ChronosBoltPipeline with a pretrained model.
-        
+
         Parameters
         ----------
         model
@@ -457,11 +450,11 @@ class ChronosBoltPipeline(BaseChronosPipeline):
     ) -> Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
         """
         Extract encoder embeddings for the given time series.
-        
+
         This method processes the input time series through patching and instance
         normalization, then extracts encoder embeddings that can be used for
         downstream tasks like clustering, classification, or similarity search.
-        
+
         Parameters
         ----------
         context
@@ -469,27 +462,18 @@ class ChronosBoltPipeline(BaseChronosPipeline):
             of 1D tensors (multiple series of varying lengths), or a 2D tensor
             where the first dimension is batch size. For 2D tensors, use
             left-padding with torch.nan to align series of different lengths.
-        
+
         Returns
         -------
-        embeddings
+        torch.Tensor
             Encoder embeddings with shape (batch_size, num_patches + 1, d_model),
             where num_patches is the number of patches created from the input
             time series, and the extra 1 is for the [REG] token if used by the model.
             Returned on CPU in the model's dtype.
-        loc_scale
+        Tuple[torch.Tensor, torch.Tensor]
             Tuple of (location, scale) tensors used for instance normalization,
             representing the mean and standard deviation of the original time series.
             Both tensors have shape (batch_size,) and are returned on CPU.
-        
-        Notes
-        -----
-        The embeddings are extracted after patching and instance normalization
-        but before the decoder. They capture the encoded representation of the
-        input time series in the model's latent space.
-        
-        If the input context is longer than the model's context length, it will
-        be automatically truncated to the most recent time steps.
         """
         context_tensor = self._prepare_and_validate_context(context=context)
         model_context_length = self.model.config.chronos_config["context_length"]
@@ -515,12 +499,12 @@ class ChronosBoltPipeline(BaseChronosPipeline):
     ) -> torch.Tensor:
         """
         Generate quantile forecasts for the given time series.
-        
+
         This method directly predicts quantiles without generating sample trajectories.
         For predictions longer than the model's built-in horizon, it uses an
         autoregressive approach that expands the batch size by the number of quantiles
         to generate more robust long-horizon forecasts.
-        
+
         Parameters
         ----------
         inputs
@@ -535,7 +519,7 @@ class ChronosBoltPipeline(BaseChronosPipeline):
             When True, raises an error if prediction_length exceeds the model's
             built-in prediction length. When False (default), allows longer
             predictions with a warning about potential quality degradation.
-        
+
         Returns
         -------
         torch.Tensor
@@ -544,26 +528,27 @@ class ChronosBoltPipeline(BaseChronosPipeline):
             For official Chronos-Bolt models, num_quantiles is 9 for quantiles
             [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9].
             Returned in fp32 on CPU.
-        
+
         Raises
         ------
         ValueError
             If limit_prediction_length is True and prediction_length exceeds
             the model's built-in prediction length.
-        
+
         Notes
         -----
         For predictions longer than the model's built-in horizon, the method uses
         an autoregressive approach:
+
         1. Generate initial quantiles for the first chunk
         2. Expand context by num_quantiles (treating each quantile as a scenario)
         3. Generate next chunk for each scenario
         4. Compute empirical quantiles across all scenarios
         5. Repeat until desired prediction_length is reached
-        
+
         This approach scales the batch size by num_quantiles for long horizons,
         which may require more GPU memory but produces more robust predictions.
-        
+
         If the input context is longer than the model's context length, it will
         be automatically truncated to the most recent time steps.
         """
@@ -639,12 +624,12 @@ class ChronosBoltPipeline(BaseChronosPipeline):
     ) -> Tuple[torch.Tensor, torch.Tensor]:
         """
         Generate quantile and mean forecasts for given time series.
-        
+
         This method generates forecasts at the specified quantile levels. If the
         requested quantiles match those the model was trained on, they are returned
         directly. Otherwise, the method performs interpolation or extrapolation
         to obtain the requested quantiles.
-        
+
         Parameters
         ----------
         inputs
@@ -661,32 +646,32 @@ class ChronosBoltPipeline(BaseChronosPipeline):
         **predict_kwargs
             Additional keyword arguments passed to the predict method, such as
             limit_prediction_length.
-        
+
         Returns
         -------
-        quantiles
+        torch.Tensor
             Tensor of quantile forecasts with shape
             (batch_size, prediction_length, num_quantiles).
             Returned in fp32 on CPU.
-        mean
+        torch.Tensor
             Tensor of mean forecasts with shape (batch_size, prediction_length).
             This is actually the median (0.5 quantile) from the model's predictions.
             Returned in fp32 on CPU.
-        
+
         Notes
         -----
         If the requested quantile_levels are a subset of the model's training
         quantiles, they are extracted directly without interpolation.
-        
+
         If quantile_levels include values outside the range of training quantiles,
         the method will extrapolate using the minimum/maximum training quantiles,
         which may significantly affect prediction quality. A warning will be issued
         in this case.
-        
+
         The interpolation/extrapolation assumes the model's training quantiles
         formed an equidistant grid (e.g., 0.1, 0.2, ..., 0.9), which holds for
         official Chronos-Bolt models but may not be true for custom models.
-        
+
         The mean returned is actually the median (0.5 quantile) from the model's
         predictions, not a true mean.
         """
@@ -734,11 +719,11 @@ class ChronosBoltPipeline(BaseChronosPipeline):
     def from_pretrained(cls, pretrained_model_name_or_path, *args, **kwargs):
         """
         Load a pretrained ChronosBoltPipeline from various sources.
-        
+
         This method loads a pretrained ChronosBoltPipeline model from a local path,
         S3 bucket, or the HuggingFace Hub. It automatically instantiates the
         appropriate model architecture based on the configuration.
-        
+
         Parameters
         ----------
         pretrained_model_name_or_path
@@ -754,26 +739,26 @@ class ChronosBoltPipeline(BaseChronosPipeline):
             - torch_dtype: Data type for model weights ("auto", "float32", "bfloat16")
             - device_map: Device placement strategy for model layers
             - Other transformers AutoConfig and model arguments
-        
+
         Returns
         -------
         ChronosBoltPipeline
             An instance of ChronosBoltPipeline with the loaded model.
-        
+
         Raises
         ------
         AssertionError
             If the configuration is not a valid Chronos config.
-        
+
         Notes
         -----
         For S3 URIs, the method delegates to BaseChronosPipeline.from_pretrained
         which handles S3 download and caching.
-        
+
         The method automatically detects the model architecture from the configuration
         and instantiates the appropriate class. If the architecture is not recognized,
         it defaults to ChronosBoltModelForForecasting.
-        
+
         This method supports all arguments accepted by HuggingFace's AutoConfig
         and model classes.
         """