diff --git a/notebooks/deploy-chronos-to-amazon-sagemaker.ipynb b/notebooks/deploy-chronos-to-amazon-sagemaker.ipynb index 89582fc..f629b00 100644 --- a/notebooks/deploy-chronos-to-amazon-sagemaker.ipynb +++ b/notebooks/deploy-chronos-to-amazon-sagemaker.ipynb @@ -29,28 +29,41 @@ "**[Serverless Inference](https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html)** [(Section 2)](#Section-2:-Serverless-Inference)\n", "- ✅ Pay only for active inference time, no infrastructure management\n", "- ✅ Cost-efficient for intermittent or unpredictable traffic\n", - "- ❌ Cold start latency on first request after idle, CPU only, 6GB memory limit\n", + "- ❌ Cold start latency on first request after idle, CPU only, lowest throughput of all options\n", "- ❌ [More complex setup](#Setup-for-Serverless-and-Batch-Transform) (requires repackaging model artifacts)\n", "\n", "**[Batch Transform](https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html)** [(Section 3)](#Section-3:-Batch-Transform)\n", "- ✅ Pay only for active compute time, no persistent infrastructure\n", "- ✅ Cost-efficient for large-scale batch prediction jobs\n", - "- ❌ Highest latency (not for real-time use), CPU only, requires data in S3\n", + "- ❌ Initialization takes severa minutes for each job (not for real-time use), CPU only, requires data in S3\n", "- ❌ [More complex setup](#Setup-for-Serverless-and-Batch-Transform) (requires repackaging model artifacts)\n", "\n", - "**Reference benchmark** on M5 dataset (30K daily retail time series, prediction_length=28):\n", - "| Mode | Instance | Time |\n", + "**Reference benchmark** on a dataset with 1M rows (2000 time series with 500 observations each) and prediction length of 28:\n", + "| Mode | Instance | Inference time (s) |\n", "|------|----------|------|\n", - "| Real-time (GPU) | ml.g5.2xlarge | X min |\n", - "| Real-time (CPU) | ml.c5.4xlarge | X min |\n", - "| Serverless | 6GB memory | X min |\n", - "| Batch Transform | ml.c5.4xlarge | X min |\n", + "| Real-time (GPU) | ml.g5.2xlarge | 18 |\n", + "| Real-time (CPU) | ml.c5.4xlarge | 50 |\n", + "| Serverless | 6GB memory | 120 |\n", + "| Batch Transform | ml.c5.4xlarge | 60 (+200s setup) |\n", "\n", "We recommend starting with **Real-time Inference** as it offers the simplest setup and highest throughput. Consider Serverless or Batch Transform when you need to optimize costs and don't require GPU acceleration.\n", "\n", "For a complete specification of all supported request parameters, see the [Endpoint API Reference](#Endpoint-API-Reference) at the end of this notebook." ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "15b5fd55", + "metadata": {}, + "outputs": [], + "source": [ + "# GPU: 18s\n", + "# CPU: 50s\n", + "# Serverless: 120s\n", + "# Batch transform: 60s (+200s setup)" + ] + }, { "cell_type": "markdown", "id": "78b40323", @@ -879,7 +892,7 @@ "metadata": {}, "source": [ "---\n", - "## Setup for Serverless and Batch Transform\n", + "## Setup for Serverless Inference and Batch Transform\n", "\n", "Serverless Inference and Batch Transform only support CPU instances. Unlike real-time inference with JumpStart, these modes require you to create a custom SageMaker Model with repackaged artifacts.\n", "\n", @@ -999,6 +1012,30 @@ "chronos_model.create()" ] }, + { + "cell_type": "markdown", + "id": "129bc389", + "metadata": {}, + "source": [ + "Alternatively, you can load an existing model as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f7cb0f14", + "metadata": {}, + "outputs": [], + "source": [ + "# model_info = boto3.client(\"sagemaker\").describe_model(ModelName=\"chronos-2-cpu\")\n", + "# model = Model(\n", + "# model_data=model_info[\"PrimaryContainer\"][\"ModelDataUrl\"],\n", + "# image_uri=model_info[\"PrimaryContainer\"][\"Image\"],\n", + "# role=model_info[\"ExecutionRoleArn\"],\n", + "# name=model_info[\"ModelName\"],\n", + "# )" + ] + }, { "cell_type": "markdown", "id": "ba12b52d",