*Issue #, if available:*
*Description of changes:*
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:*
- Remove for-loop with numpy operations + single pd.DataFrame
construction
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:* Adds support for custom callbacks after each
batch is processed during prediction. This allows for keeping track of
the time limit in AutoGluon.
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:*
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:*
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:* This PR improves test coverage by adding unit
tests for `df_utils`. Previously these methods were only being tested as
part of Chronos-2 integration tests.
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:*
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:*
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:*
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:*
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:*
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:* Previously, only the returned pipeline had
correct configuration but it was not being saved to disk.
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:*
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:*
- Rename `predict_batches_jointly` to `cross_learning`
- Add deprecation warning
- Add cross_learning to predict_df docstring
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:*
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:*
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:*
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:* 0 is a better default than 1.
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:* Fixes#403
*Description of changes:*
- Update the `future_df` validation logic to only check that
`prediction_length` values are provided for each item.
- Update unit tests for DF-based methods in `test_chronos2.py`
- Ignore fine-tuned checkpoint folders with `.gitignore`
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:*
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:*
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:* This PR adds a `validate_inputs ` argument to
`predict_df` (defaults to `True`), which allows the user to disable
dataframe validation when they know that their dataframe is in the right
format. This reduces runtime by removing the input validation component,
e.g., when calling this method from
[AutoGluon](https://github.com/autogluon/autogluon/pull/5427), and also
handles series with shorter than 3 timesteps.
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:* Adds support for LoRA fine-tuning.
- [x] Move peft/pandas dependency to an extra
- [x] Add tests for LoRA
- [x] Update notebook with LoRA info
- [x] Enable automatic recognition and loading of LoRA adapters
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:* Addresses #391
*Description of changes:*
- Speed up `convert_df_input_to_list_of_dicts_input` and
`validate_df_inputs` via a few tricks:
- Replace `df.iloc[start_idx:end_idx][col]` with
`df[col].iloc[start_idx:end_idx]` to avoid copying data on each slice
- Vectorize computation of future timestamps using numpy
- Work with `dict[str, np.ndarray]` instead of `pd.DataFrame` when
working with covariates to avoid repeated `.to_numpy()` calls.
**Before**
```
Benchmarking 20000 series, 200 steps, 0 covariates...
Average runtime: 27.33s
Benchmarking 20000 series, 200 steps, 5 covariates...
Average runtime: 44.69s
```
**After**
```
Benchmarking 20000 series, 200 steps, 0 covariates...
Average runtime: 4.60s
Benchmarking 20000 series, 200 steps, 5 covariates...
Average runtime: 8.92s
```
<details>
```python
import time
import numpy as np
import pandas as pd
from chronos.df_utils import convert_df_input_to_list_of_dicts_input
def benchmark_convert_df_input(
num_items: int, num_steps: int, num_covariates: int = 0, num_trials: int = 10, freq: str = "D"
) -> None:
"""
Benchmark convert_df_input_to_list_of_dicts_input function.
Args:
num_items: Number of time series
num_steps: Number of observations per series
num_covariates: Number of covariates to include
num_trials: Number of benchmark trials
freq: Frequency string for timestamps
"""
prediction_length = 24
# Generate context DataFrame
item_ids = np.repeat(np.arange(num_items), num_steps)
timestamps = np.tile(pd.date_range("2020-01-01", periods=num_steps, freq=freq), num_items)
df_data = {"item_id": item_ids, "timestamp": timestamps, "target": np.random.randn(num_items * num_steps)}
df_data.update({f"cov_{i}": np.random.randn(num_items * num_steps) for i in range(num_covariates)})
df = pd.DataFrame(df_data)
# Generate future_df with covariates
future_df = None
if num_covariates > 0:
future_item_ids = np.repeat(np.arange(num_items), prediction_length)
offset = pd.tseries.frequencies.to_offset(freq)
future_start = pd.Timestamp("2020-01-01") + num_steps * offset
future_timestamps = np.tile(pd.date_range(start=future_start, periods=prediction_length, freq=freq), num_items)
future_data = {"item_id": future_item_ids, "timestamp": future_timestamps}
future_data.update({f"cov_{i}": np.random.randn(num_items * prediction_length) for i in range(num_covariates)})
future_df = pd.DataFrame(future_data)
times = []
print(f"Benchmarking {num_items} series, {num_steps} steps, {num_covariates} covariates...")
for _ in range(num_trials):
start = time.perf_counter()
convert_df_input_to_list_of_dicts_input(
df=df,
future_df=future_df,
id_column="item_id",
timestamp_column="timestamp",
target_columns=["target"],
prediction_length=prediction_length,
)
end = time.perf_counter()
times.append(end - start)
print(f"Average runtime: {sum(times) / len(times):.2f}s")
if __name__ == "__main__":
# Test without covariates
benchmark_convert_df_input(20_000, 200, num_covariates=0, num_trials=1)
# Test with covariates
benchmark_convert_df_input(20_000, 200, num_covariates=5, num_trials=1)
```
</details>
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:*
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:*
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:*
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:*
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:* Lower learning rates generally appear to be
working better. This is probably because we are doing full fine-tuning
of a model with 120M params.
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:* This PR masks rows corresponding to all
covariates in the future target. Specifically, this is to avoid the
contribution of past-only covariates in loss computation. The previous
setup was correct from the perspective of pretraining but I think this
makes more sense for fine-tuning.
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:* #354
*Description of changes:* This PR adds `Chronos2Pipeline.embed` to
enable users to extract embeddings from the last encoder layer in an
easy way. The API and behavior is similar to what Chronos and
Chronos-Bolt provides.
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:* This PR adds `predict_df` to the base pipeline
which enables pandas support for the univariate Chronos and Chronos-Bolt
models.
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
By default, the `transformers` library sets the `num_workers` argument
of the PyTorch DataLoader to `0`, ensuring out-of-the-box compatibility
across different platforms.
*Issue #, if available:*
*Description of changes:*
Set the DataLoader `num_workers` argument to `0` to improve
cross-platform compatibility, particularly on Windows systems where
multiprocessing requires guarded execution.
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
---------
Co-authored-by: Abdul Fatir <Abdulfatirs@gmail.com>
*Issue #, if available:*
*Description of changes:*
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:*
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
---------
Co-authored-by: Oleksandr Shchur <oleks.shchur@gmail.com>
Removed logo image from the README.
*Issue #, if available:*
*Description of changes:*
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:*
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
*Issue #, if available:*
*Description of changes:*
- Add notebook showcasing how Chronos-2 can be deployed using Amazon
SageMaker
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
Closes#345
Hi @abdulfatir
Here is the bugfix about the function
"validate_and_prepare_single_dict_task", which had 2 issue points:
1. Originally, one of this func return, the "task_n_future_covariates",
will return the ["past only" + "future known"]covariates number, by
`task_n_future_covariates = len(task_future_covariates_list)` as
`task_future_covariates_list ` is filled by for` key in
task_covariates_keys`
2. The code seems not to guarantee the last "future known" rows are
atcually what we expected, even there is a sorted option.
So, this PR fixed them by separating "past only" and "future known" covs
from the "past_covariates" input, and explicitly put the "past only"
covs rows above "future known" cov rows, supported by a temp list
"ordered_covariate_keys".
*Issue #, if available:*
*Description of changes:*
- Move the SageMaker deployment notebook to a new path to check that the
GitHub redirect feature works.
By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.