Commit graph

13 commits

Author SHA1 Message Date
Abdul Fatir
5242d986f4
Remove float32 casting for cumsum (#53)
*Description of changes:* This PR removes casting to `fp32` for the
`cumsum` operation and upgrades `mlx` to `~=0.10.0` which adds `bf16`
support for `cumsum`.

Related: https://github.com/ml-explore/mlx/issues/959

By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.

Co-authored-by: Abdul Fatir Ansari <ansarnd@amazon.com>
2024-04-12 20:41:12 +02:00
Abdul Fatir
159ea36f7f
Add MLX inference support (#41)
*Issue #, if available:* #28

*Description of changes:* This PR adds MLX inference support.

## Summary of changes
- Update `pyproject.toml` with`mlx` dependencies.
- Create `chronos_mlx` package which will hosts all mlx inference stuff.
- All classes from `main:src/chronos/chronos.py` are copy-pasted into
`mlx:src/chronos_mlx/chronos.py` and modified to use numpy and mlx
arrays instead. Note that the reason for using numpy arrays as input and
output is that mlx doesn't support some operations that are required for
input and output transform.
- MLX implementation of T5 is in `src/chronos_mlx/t5.py`. It has been
adapted from
[ml-explore/mlx-examples](b8a348c1b8/t5/t5.py)
with the following main modifications:
      - Added support for attention mask.
      - Added support for top_k and top_p sampling.
- `src/chronos_mlx/translate.py` translates weights from a torch HF
model to mlx.
- Add `THIRD-PARTY-LICENSES.txt` for third party code from
`mlx-examples`.
- Add tests and CI for `mlx` version. 

## Sample inference code

```py
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from chronos_mlx import ChronosPipeline

pipeline = ChronosPipeline.from_pretrained(
    "amazon/chronos-t5-small",
    dtype="bfloat16",
)

df = pd.read_csv(
    "https://raw.githubusercontent.com/AileenNielsen/TimeSeriesAnalysisWithPython/master/data/AirPassengers.csv"
)

# context must be either a 1D tensor, a list of 1D tensors,
# or a left-padded 2D tensor with batch as the first dimension
context = df["#Passengers"].values
prediction_length = 12
forecast = pipeline.predict(
    context, prediction_length
)  # shape [num_series, num_samples, prediction_length]

# visualize the forecast
forecast_index = range(len(df), len(df) + prediction_length)
low, median, high = np.quantile(forecast[0], [0.1, 0.5, 0.9], axis=0)

plt.figure(figsize=(8, 4))
plt.plot(df["#Passengers"], color="royalblue", label="historical data")
plt.plot(forecast_index, median, color="tomato", label="median forecast")
plt.fill_between(
    forecast_index,
    low,
    high,
    color="tomato",
    alpha=0.3,
    label="80% prediction interval",
)
plt.legend()
plt.grid()
plt.show()

```

## Benchmark


![benchmark](https://github.com/amazon-science/chronos-forecasting/assets/4028948/ee5d1b17-d33e-473c-aa7a-55dbe1059b9c)


```py
import timeit

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import torch
from gluonts.dataset.repository import get_dataset
from gluonts.dataset.split import split
from gluonts.ev.metrics import MASE, MeanWeightedSumQuantileLoss
from gluonts.model.evaluation import evaluate_forecasts
from gluonts.model.forecast import SampleForecast
from tqdm.auto import tqdm

from chronos import ChronosPipeline as ChronosPipelineTorch
from chronos_mlx import ChronosPipeline as ChronosPipelineMLX


def benchmark_torch_model(
    pipeline: ChronosPipelineTorch,
    gluonts_dataset: str = "m4_hourly",
    batch_size: int = 32,
):
    dataset = get_dataset(gluonts_dataset)
    prediction_length = dataset.metadata.prediction_length
    _, test_template = split(dataset.test, offset=-prediction_length)
    test_data = test_template.generate_instances(prediction_length)
    test_data_input = list(test_data.input)

    start_time = timeit.default_timer()
    forecasts = []
    for idx in tqdm(range(0, len(test_data_input), batch_size)):
        batch = [
            torch.tensor(item["target"])
            for item in test_data_input[idx : idx + batch_size]
        ]
        batch_forecasts = pipeline.predict(batch, prediction_length)
        forecasts.append(batch_forecasts)
    forecasts = torch.cat(forecasts)
    end_time = timeit.default_timer()

    print(f"Inference time: {end_time-start_time:.2f}s")

    results_df = evaluate_forecasts(
        forecasts=[
            SampleForecast(fcst.numpy(), start_date=label["start"])
            for fcst, label in zip(forecasts, test_data.label)
        ],
        test_data=test_data,
        metrics=[MASE(), MeanWeightedSumQuantileLoss(np.arange(0.1, 1, 0.1))],
    )
    results_df["inference_time"] = end_time - start_time
    return results_df


def benchmark_mlx_model(
    pipeline: ChronosPipelineMLX,
    gluonts_dataset: str = "m4_hourly",
    batch_size: int = 32,
):
    dataset = get_dataset(gluonts_dataset)
    prediction_length = dataset.metadata.prediction_length
    _, test_template = split(dataset.test, offset=-prediction_length)
    test_data = test_template.generate_instances(prediction_length)
    test_data_input = list(test_data.input)

    start_time = timeit.default_timer()
    forecasts = []
    for idx in tqdm(range(0, len(test_data_input), batch_size)):
        batch = [item["target"] for item in test_data_input[idx : idx + batch_size]]
        batch_forecasts = pipeline.predict(batch, prediction_length)
        forecasts.append(batch_forecasts)
    forecasts = np.concatenate(forecasts)
    end_time = timeit.default_timer()

    print(f"Inference time: {end_time-start_time:.2f}s")

    results_df = evaluate_forecasts(
        forecasts=[
            SampleForecast(fcst, start_date=label["start"])
            for fcst, label in zip(forecasts, test_data.label)
        ],
        test_data=test_data,
        metrics=[MASE(), MeanWeightedSumQuantileLoss(np.arange(0.1, 1, 0.1))],
    )
    results_df["inference_time"] = end_time - start_time
    return results_df


def main(
    version: str = "cpu",  # cpu, mps, mlx
    dtype: str = "bfloat16",
    gluonts_dataset: str = "australian_electricity_demand",
    model_name: str = "amazon/chronos-t5-small",
    batch_size: int = 4,
):
    if version == "cpu" or version == "mps":
        pipeline = ChronosPipelineTorch.from_pretrained(
            model_name,
            device_map=version,
            torch_dtype=getattr(torch, dtype),
        )
        benchmark_fn = benchmark_torch_model
    else:
        pipeline = ChronosPipelineMLX.from_pretrained(model_name, dtype=dtype)
        benchmark_fn = benchmark_mlx_model

    result_df = benchmark_fn(
        pipeline, gluonts_dataset=gluonts_dataset, batch_size=batch_size
    )
    result_df["model"] = model_name
    return result_df


if __name__ == "__main__":
    gluonts_dataset: str = "m4_hourly"
    model_name: str = "amazon/chronos-t5-mini"
    batch_size: int = 8
    dfs = []
    for version in ["cpu", "mps", "mlx"]:
        for dtype in ["float32"]:
            try:
                df = main(
                    version=version,
                    dtype=dtype,
                    model_name=model_name,
                    gluonts_dataset=gluonts_dataset,
                    batch_size=batch_size,
                )
                df["version"] = version
                df["dtype"] = dtype
                dfs.append(df)
            except TypeError:
                pass

    result_df = pd.concat(dfs).reset_index(drop=True)
    result_df.to_csv("benchmark.csv", index=False)

    result_df["version"] = result_df["version"].map(
        {"cpu": "Torch (CPU)", "mps": "Torch (MPS)", "mlx": "MLX"}
    )
    fig = plt.figure(figsize=(8, 5))
    g = sns.barplot(
        data=result_df,
        x="dtype",
        y="inference_time",
        hue="version",
        alpha=0.6,
    )
    plt.ylabel("Inference Time (on M1 Pro)")
    plt.title(f"{model_name} inference times on {gluonts_dataset} dataset")
    plt.savefig("benchmark.png", dpi=200)

```

## TODOs:
- [x] Implement `top_p` sampling.
- [x] Add tests.
- [x] Add CI.

By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.

---------

Co-authored-by: Abdul Fatir Ansari <ansarnd@amazon.com>
2024-04-08 15:03:44 +02:00
Lorenzo Stella
2042779efa
Simplify tokenizer creation (#44)
*Description of changes:* Minor simplification to how the tokenizer is
constructed from the config


By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
2024-04-05 17:15:33 +02:00
Lorenzo Stella
b4423b8c4d
Speed up workflow (#43)
*Description of changes:* Speed up GH workflow by installing CPU-only
version of torch


By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
2024-04-05 16:55:25 +02:00
Lorenzo Stella
4b1d1c818b
Fix types, add mypy to workflow (#42)
*Description of changes:* Fix some type checking issues, add mypy to
github workflow, apply black


By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
2024-04-05 15:36:39 +02:00
Pixee OSS Assistant
96cedec3fa
Remove Unnecessary F-strings (#34)
*Issue #, if available:* N/A

*Description of changes:*
This codemod converts any f-strings without interpolated variables into
regular strings.
In these cases the use of f-string is not necessary; a simple string
literal is sufficient.

While in some (extreme) cases we might expect a very modest performance
improvement, in general this is a fix that improves the overall
cleanliness and
quality of your code.

```diff
- var = f"hello"
+ var = "hello"
  ...
```

<details>
  <summary>More reading</summary>

*
[https://pylint.readthedocs.io/en/latest/user_guide/messages/warning/f-string-without-interpolation.html](https://pylint.readthedocs.io/en/latest/user_guide/messages/warning/f-string-without-interpolation.html)
*
[https://github.com/Instagram/LibCST/blob/main/libcst/codemod/commands/unnecessary_format_string.py](https://github.com/Instagram/LibCST/blob/main/libcst/codemod/commands/unnecessary_format_string.py)
</details>

Powered by: [pixeebot](https://docs.pixee.ai/) (codemod ID:
[pixee:python/remove-unnecessary-f-str](https://docs.pixee.ai/codemods/python/pixee_python_remove-unnecessary-f-str))
![](https://d1zaessa2hpsmj.cloudfront.net/pixel/v1/track?writeKey=2PI43jNm7atYvAuK7rJUz3Kcd6A&event=DRIP_PR%7CPixee-Bot-Python%2Fchronos-forecasting%7C0822cf23d3ea7d0de7d1b3685ba6e93f9e17ca0d)

<!--{"type":"DRIP","codemod":"pixee:python/remove-unnecessary-f-str"}-->

By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.

Co-authored-by: pixeebot[bot] <104101892+pixeebot[bot]@users.noreply.github.com>
2024-03-31 19:04:19 +02:00
Lorenzo Stella
b4a6c0c2eb
Bump package version to 1.1 (#27)
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
2024-03-25 13:23:08 +01:00
Abdul Fatir
0595bd872b
Add pipeline.embed (#24)
*Description of changes:* This PR adds `pipeline.embed` which extracts
encoder embeddings from the model. These embeddings may be useful for
some downstream tasks such as classification, so this is useful to have.


By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.

---------

Co-authored-by: Abdul Fatir Ansari <ansarnd@amazon.de>
2024-03-25 13:18:50 +01:00
Lorenzo Stella
28752931fd
Speed up inference by avoiding unnecessary padding (#25)
*Issue #, if available:* Unnecessary context padding slows down
inference. We evaluated the models from HF with this change, and found
no concerning issue with accuracy.

Test code for a context of length 200:

```python
import torch
from chronos import ChronosPipeline
import time

pipeline = ChronosPipeline.from_pretrained(
    "amazon/chronos-t5-large",
    device_map="cuda",
    torch_dtype=torch.bfloat16,
)

context = torch.ones((8, 200))
prediction_length = 24
num_runs = 10

t0 = time.time()
for _ in range(num_runs):
    forecast = pipeline.predict(
        context,
        prediction_length,
        num_samples=20,
    )
t1 = time.time()

print(f"total time: {t1 - t0}")
```

Before the change:

```
total time: 20.005481481552124
```

After the change:

```
total time: 9.82350754737854
```

*Description of changes:* Remove padding in case the provided batch is
shorter than `context_length`.


By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
2024-03-25 12:39:30 +01:00
Abdul Fatir
73be25042f
Add optional inference params to example (#15)
*Description of changes:* This PR adds optional inference params such as
`num_samples`, `top_k`, etc. to the example in the README for clarity.


By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
2024-03-18 13:13:06 +01:00
Michael Feil
ef786e9864
Update chronos.py - model.device (#11)
*Issue #, if available:* N/A

*Description of changes:*

Thanks for the very clean impl of the Model, Tokenizer, and Pipeline. 

I was curios about it and found a minor improvement in the API - what do
you think about it? Feel free to close. Change is untested.

By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
2024-03-15 10:30:33 +01:00
Lorenzo Stella
7ba945c995 Upload code 2024-03-13 09:58:39 +01:00
Amazon GitHub Automation
2420c10232
Initial commit 2024-02-23 02:35:45 -08:00