From 3f43317206d53392cd18ef1909ceee682c6612bf Mon Sep 17 00:00:00 2001 From: Oleksandr Shchur Date: Tue, 30 Dec 2025 15:10:39 +0000 Subject: [PATCH] Address PR comment --- .../deploy-chronos-to-amazon-sagemaker.ipynb | 36 +++++++++++-------- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/notebooks/deploy-chronos-to-amazon-sagemaker.ipynb b/notebooks/deploy-chronos-to-amazon-sagemaker.ipynb index aa847f0..27e678e 100644 --- a/notebooks/deploy-chronos-to-amazon-sagemaker.ipynb +++ b/notebooks/deploy-chronos-to-amazon-sagemaker.ipynb @@ -21,22 +21,22 @@ "### Deployment Options\n", "This notebook covers three deployment modes on SageMaker:\n", "\n", - "**[Real-time Inference](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints.html)** (Section 1)\n", - "- ✅ Highest throughput, consistently low latency, supports both GPU and CPU instances\n", - "- ✅ Simple setup via JumpStart\n", - "- ❌ By default, you pay for the time the endpoint is running (can be configured to [scale to zero](https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling-zero-instances.html))\n", + "1. **[Real-time Inference](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints.html)**\n", + " - ✅ Highest throughput, consistently low latency, supports both GPU and CPU instances\n", + " - ✅ Simple setup via JumpStart\n", + " - ❌ By default, you pay for the time the endpoint is running (can be configured to [scale to zero](https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling-zero-instances.html))\n", "\n", - "**[Serverless Inference](https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html)** (Section 2)\n", - "- ✅ Pay only for active inference time, no infrastructure management\n", - "- ✅ Cost-efficient for intermittent or unpredictable traffic\n", - "- ❌ Cold start latency on first request after idle, CPU only, lowest throughput of all options\n", - "- ❌ More complex setup (requires repackaging model artifacts)\n", + "2. **[Serverless Inference (CPU only)](https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html)**\n", + " - ✅ Pay only for active inference time, no infrastructure management\n", + " - ✅ Cost-efficient for intermittent or unpredictable traffic\n", + " - ❌ Cold start latency on first request after idle, lowest throughput of all options\n", + " - ❌ More complex setup (requires repackaging model artifacts)\n", "\n", - "**[Batch Transform](https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html)** (Section 3)\n", - "- ✅ Pay only for active compute time, no persistent infrastructure\n", - "- ✅ Cost-efficient for large-scale batch prediction jobs\n", - "- ❌ Initialization takes severa minutes for each job (not for real-time use), CPU only, requires data in S3\n", - "- ❌ More complex setup (requires repackaging model artifacts)\n", + "3. **[Batch Transform](https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html)**\n", + " - ✅ Pay only for active compute time, no persistent infrastructure\n", + " - ✅ Cost-efficient for large-scale batch prediction jobs\n", + " - ❌ Initialization takes severa minutes for each job (not for real-time use), requires data in S3\n", + " - ❌ More complex setup (requires repackaging model artifacts)\n", "\n", "**Reference benchmark** on a dataset with 1M rows (2000 time series with 500 observations each) and prediction length of 28:\n", "| Mode | Instance | Inference time (s) |\n", @@ -1410,8 +1410,14 @@ "main_language": "python", "notebook_metadata_filter": "-all" }, + "kernelspec": { + "display_name": "ag", + "language": "python", + "name": "python3" + }, "language_info": { - "name": "python" + "name": "python", + "version": "3.12.9" } }, "nbformat": 4,