ashim/apps/docs/api/ai.md
Siddharth Kumar Sah 80e536bcf8 chore: remove dead code, add test infrastructure, update docs
- Delete 3 dead files: use-batch-processor.ts, use-i18n.ts, smart-crop.ts (AI package)
- Remove dead getJobProgress function and unused runPythonScript wrapper
- Remove 6 unused imports across API and web apps
- Remove unused shared types (ImageFormat, AppConfig, ApiError, HealthResponse, JobProgress)
  and constants (SUPPORTED_INPUT_FORMATS/OUTPUT_FORMATS, DEFAULT_OUTPUT_FORMAT)
- Remove unused store method (setOriginalBlobUrl) and clean AI package re-exports
- Add test infrastructure: vitest config, unit/integration/e2e tests, fixtures, screenshots
- Add Docker test infrastructure: Dockerfile.test, docker-compose.test.yml
- Add download_models.py for pre-baking AI model weights in Docker
- Add filename sanitization utility (apps/api/src/lib/filename.ts)
- Update .gitignore to exclude coverage/, *.tsbuildinfo, .superpowers/, test artifacts
- Update .dockerignore to exclude test/coverage/IDE artifacts from builds
- Update docs: remove smart crop from AI docs (uses Sharp directly), update bridge docs
2026-03-23 11:46:45 +08:00

3.3 KiB

AI engine

The @stirling-image/ai package wraps Python ML models in TypeScript functions. Each operation spawns a Python subprocess, processes the image, and returns the result. The bridge layer handles serialization and error propagation.

All model weights are bundled in the Docker image during the build. No downloads happen at runtime.

Background removal

Removes the background from an image and returns a transparent PNG.

Model: BiRefNet-Lite via rembg

Parameter Type Description
model string Model name. Default: birefnet-lite. Options include u2net, isnet-general-use, and others supported by rembg.
alphaMatting boolean Use alpha matting for finer edge detail
alphaMattingForegroundThreshold number Foreground threshold for alpha matting (0-255)
alphaMattingBackgroundThreshold number Background threshold for alpha matting (0-255)

Python script: packages/ai/python/remove_bg.py

Upscaling

Increases image resolution using AI super-resolution.

Model: RealESRGAN

Parameter Type Description
scale number Upscale factor: 2 or 4

Returns the upscaled image along with the original and new dimensions.

Python script: packages/ai/python/upscale.py

OCR (text recognition)

Extracts text from images.

Model: PaddleOCR

Parameter Type Description
language string Language code (e.g. en, ch, fr, de)

Returns structured results with text content, bounding boxes, and confidence scores for each detected text region.

Python script: packages/ai/python/ocr.py

Face detection and blurring

Detects faces in an image and applies a blur to each detected region.

Model: MediaPipe Face Detection

Parameter Type Description
blurStrength number How strongly to blur detected faces

Returns the blurred image along with metadata about each detected face region (bounding box coordinates and confidence score).

Python script: packages/ai/python/detect_faces.py

Object erasing (inpainting)

Removes objects from images by filling in the area with generated content that matches the surroundings.

Model: LaMa (Large Mask Inpainting)

Takes an image and a mask (white = area to erase, black = keep). Returns the inpainted image.

Python script: packages/ai/python/inpaint.py

How the bridge works

The TypeScript bridge (packages/ai/src/bridge.ts) exposes a single function, runPythonWithProgress, that does the following for each AI call:

  1. Writes the input image to a temp file in the workspace directory.
  2. Spawns a Python subprocess with the appropriate script and arguments.
  3. Parses JSON progress lines from stderr (e.g. {"progress": 50, "stage": "Processing..."}) and forwards them via an onProgress callback for real-time SSE streaming.
  4. Reads stdout for JSON output.
  5. Reads the output image from the filesystem.
  6. Cleans up temp files.

If the Python process exits with a non-zero code, the bridge extracts a user-friendly error from stderr/stdout and throws. Timeouts default to 5 minutes.