docs: add resource footprint section — idle memory, lazy model loading, cold start behavior

2026-04-21 13:37:52 +00:00 · 2026-04-16 16:51:18 +08:00 · 2026-04-16 16:51:18 +08:00 · e529fbb839
commit e529fbb839
parent 9d84dc262f
1 changed files with 32 additions and 8 deletions
--- a/apps/docs/guide/architecture.md
+++ b/apps/docs/guide/architecture.md
@ -27,16 +27,13 @@ This package has no network dependencies and runs entirely in-process.

 ### `@ashim/ai`

-A bridge layer that calls Python scripts for ML operations. On first use, the bridge starts a persistent Python dispatcher process that pre-imports heavy libraries (rembg, OpenCV, NumPy) and keeps them warm in memory. Subsequent AI calls skip the import overhead entirely. If the dispatcher is unavailable, the bridge falls back to spawning a fresh Python subprocess per request.
+A bridge layer that calls Python scripts for ML operations. On first use, the bridge starts a persistent Python dispatcher process that pre-imports heavy libraries (PIL, NumPy, MediaPipe, rembg) so subsequent AI calls skip the import overhead. If the dispatcher is not yet ready, the bridge falls back to spawning a fresh Python subprocess per request.

-Supported operations:
- **Background removal** -- BiRefNet-Lite model via rembg
- **Upscaling** -- RealESRGAN
- **OCR** -- PaddleOCR
- **Face detection/blurring** -- MediaPipe
- **Object erasing (inpainting)** -- OpenCV
+**Models are not pre-loaded.** Each tool script loads its model weights from disk at request time and discards them when the request finishes. See [Resource footprint](#resource-footprint) for the full memory profile.

-Python scripts live in `packages/ai/python/`. The Docker image pre-downloads all model weights during the build so the container works offline.
+Supported operations: background removal (rembg/BiRefNet), upscaling (RealESRGAN), face blur (MediaPipe), face enhancement (GFPGAN/CodeFormer), object erasing (LaMa ONNX), OCR (PaddleOCR/Tesseract), colorization (DDColor), noise removal, red eye removal, photo restoration, passport photo generation, and content-aware resize (Go caire binary).
+
+Python scripts live in `packages/ai/python/`. The Docker image pre-downloads all model weights during the build so the container works fully offline.

 ### `@ashim/shared`

@ -88,3 +85,30 @@ This VitePress site. Deployed to GitHub Pages automatically on push to `main`.
 For pipelines, the API feeds the output of each step as input to the next, running them sequentially.

 For batch processing, the API uses p-queue with a configurable concurrency limit (`CONCURRENT_JOBS`) and returns a ZIP file with all processed images.
+
+## Resource footprint
+
+ashim is designed for low idle memory use. Nothing is preloaded or kept warm at startup.
+
+### At idle
+
+Only the Node.js/Fastify process is running. Typical idle RAM is **~100-150 MB** (Node.js process + SQLite connection). No Python process, no worker threads, no model weights in memory.
+
+### What starts, and when
+
+| Component | Starts when | Memory while active |
+|-----------|-------------|---------------------|
+| Fastify server | Container start | ~100-150 MB |
+| Piscina worker threads | First standard tool request | Spawned on demand, terminated after **30 s idle** |
+| Python dispatcher | First AI tool request | Python interpreter + pre-imported libraries (PIL, NumPy, MediaPipe, rembg) - no model weights |
+| AI model weights | During the specific tool's request | Loaded from disk, freed when the request finishes |
+
+### Model loading
+
+All model weight files (totalling several GB) sit on disk in `/opt/models/` at all times. Each AI tool script loads only its own model(s) into memory for the duration of a request, then releases them. Some scripts explicitly call `del model` and `torch.cuda.empty_cache()` after inference to ensure memory is returned immediately.
+
+There is no model cache between requests. Running the same AI tool back-to-back reloads the model each time. This keeps idle memory near zero at the cost of a model-load delay on every AI request.
+
+### First AI request cold start
+
+The Python dispatcher is not running when the container starts. The first AI request triggers two things in parallel: the dispatcher starts warming up in the background, and the request itself falls back to a one-off Python subprocess spawn. Once the dispatcher signals ready, all subsequent AI requests use it directly and skip the subprocess spawn cost.