ring/dev-team/docs/standards/devops.md
Jefferson Rodrigues 675a4e3029
feat(standards): add TOC to all standards files and improve skill execution
- Add Table of Contents to devops.md, frontend.md, golang.md, sre.md, typescript.md
- Update CLAUDE.md with Four-File Update Rule and TOC maintenance guidelines
- Add checklist for adding/removing sections in standards files
- Improve dev-refactor skill to use Skill tool for dev-cycle handoff
- Add anti-rationalization patterns for gate execution shortcuts
- Add Execution Report sections to dev-cycle, dev-feedback-loop, dev-refactor
- Extract gap tracking rationalizations to shared-anti-rationalization.md

Generated-by: Claude
AI-Model: claude-opus-4-5-20251101
2025-12-17 12:47:03 -03:00

14 KiB

DevOps Standards

⚠️ MAINTENANCE: This file is indexed in dev-team/skills/shared-patterns/standards-coverage-table.md. When adding/removing ## sections, update the coverage table AND agent files per THREE-FILE UPDATE RULE in CLAUDE.md.

This file defines the specific standards for DevOps, SRE, and infrastructure.

Reference: Always consult docs/PROJECT_RULES.md for common project standards.


Table of Contents

# Section Description
1 Cloud Provider AWS, GCP, Azure services
2 Infrastructure as Code Terraform patterns and best practices
3 Containers Dockerfile, Docker Compose, .env
4 Helm Chart structure and configuration
5 Observability Logging and tracing standards
6 Security Secrets management, network policies
7 Makefile Standards Required commands and patterns

Meta-sections (not checked by agents):

  • Checklist - Self-verification before deploying

Cloud Provider

Provider Primary Services
AWS EKS, RDS, S3, Lambda, SQS
GCP GKE, Cloud SQL, Cloud Storage
Azure AKS, Azure SQL, Blob Storage

Infrastructure as Code

Terraform (Preferred)

Project Structure

/terraform
  /modules
    /vpc
      main.tf
      variables.tf
      outputs.tf
    /eks
    /rds
  /environments
    /dev
      main.tf
      terraform.tfvars
    /staging
    /prod
  backend.tf
  providers.tf
  versions.tf

State Management

# backend.tf
terraform {
  backend "s3" {
    bucket         = "company-terraform-state"
    key            = "env/prod/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

Module Pattern

# modules/eks/main.tf
module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 19.0"

  cluster_name    = var.cluster_name
  cluster_version = var.kubernetes_version

  vpc_id     = var.vpc_id
  subnet_ids = var.subnet_ids

  eks_managed_node_groups = {
    default = {
      min_size     = var.min_nodes
      max_size     = var.max_nodes
      desired_size = var.desired_nodes

      instance_types = var.instance_types
      capacity_type  = "ON_DEMAND"
    }
  }

  tags = var.tags
}

# modules/eks/variables.tf
variable "cluster_name" {
  type        = string
  description = "Name of the EKS cluster"
}

variable "kubernetes_version" {
  type        = string
  default     = "1.28"
  description = "Kubernetes version"
}

# modules/eks/outputs.tf
output "cluster_endpoint" {
  value = module.eks.cluster_endpoint
}

output "cluster_name" {
  value = module.eks.cluster_name
}

Best Practices

# Always use version constraints
terraform {
  required_version = ">= 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

# Use data sources for existing resources
data "aws_caller_identity" "current" {}
data "aws_region" "current" {}

# Use locals for computed values
locals {
  account_id = data.aws_caller_identity.current.account_id
  region     = data.aws_region.current.name
  name_prefix = "${var.project}-${var.environment}"
}

# Always tag resources
resource "aws_instance" "example" {
  # ...

  tags = merge(var.common_tags, {
    Name        = "${local.name_prefix}-instance"
    Environment = var.environment
    ManagedBy   = "terraform"
  })
}

Containers

Dockerfile Best Practices

# Multi-stage build for minimal images
FROM golang:1.22-alpine AS builder

WORKDIR /app

# Cache dependencies
COPY go.mod go.sum ./
RUN go mod download

# Build
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /app/server ./cmd/api

# Production image
FROM gcr.io/distroless/static-debian12:nonroot

COPY --from=builder /app/server /server

USER nonroot:nonroot

EXPOSE 8080

ENTRYPOINT ["/server"]

Image Guidelines

Guideline Reason
Use multi-stage builds Smaller images
Use distroless/alpine Minimal attack surface
Run as non-root Security
Pin versions Reproducibility
Use .dockerignore Smaller context

Docker Compose (Local Dev)

MANDATORY: Use .env file for environment variables instead of inline definitions.

# docker-compose.yml
services:
  api:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "8080:8080"
    env_file:
      - .env
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_started

  db:
    image: postgres:15-alpine
    env_file:
      - .env
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
      interval: 5s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7-alpine
    volumes:
      - redis_data:/data

volumes:
  postgres_data:
  redis_data:

.env File Structure

# .env (add to .gitignore)

# Application
ENV_NAME=local
LOG_LEVEL=debug
SERVER_ADDRESS=:8080

# PostgreSQL
POSTGRES_USER=user
POSTGRES_PASSWORD=pass
POSTGRES_DB=app
DB_HOST=db
DB_PORT=5432

# Redis
REDIS_HOST=redis
REDIS_PORT=6379

# Telemetry
ENABLE_TELEMETRY=false
Guideline Reason
Use env_file directive Centralized configuration
Add .env to .gitignore Prevent secrets in version control
Provide .env.example Document required variables
Use consistent naming Match application config struct

Helm

Chart Structure

/charts/api
  Chart.yaml
  values.yaml
  values-dev.yaml
  values-prod.yaml
  /templates
    deployment.yaml
    service.yaml
    ingress.yaml
    configmap.yaml
    secret.yaml
    hpa.yaml
    _helpers.tpl

Chart.yaml

apiVersion: v2
name: api
description: API service Helm chart
type: application
version: 1.0.0
appVersion: "1.0.0"
dependencies:
  - name: postgresql
    version: "12.x.x"
    repository: "https://charts.bitnami.com/bitnami"
    condition: postgresql.enabled

values.yaml

# values.yaml
replicaCount: 3

image:
  repository: company/api
  tag: latest
  pullPolicy: IfNotPresent

service:
  type: ClusterIP
  port: 80

ingress:
  enabled: true
  className: nginx
  hosts:
    - host: api.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: api-tls
      hosts:
        - api.example.com

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

postgresql:
  enabled: false  # Use external database

Observability

Logging (Structured JSON)

{
  "timestamp": "2024-01-15T10:30:00Z",
  "level": "info",
  "message": "Request completed",
  "request_id": "abc-123",
  "user_id": "usr_456",
  "method": "POST",
  "path": "/api/v1/users",
  "status": 201,
  "duration_ms": 45,
  "trace_id": "def-789"
}

Tracing (OpenTelemetry)

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024

exporters:
  jaeger:
    endpoint: jaeger:14250
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [jaeger]

Security

Secrets Management

# Use External Secrets Operator
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: api-secrets
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: ClusterSecretStore
  target:
    name: api-secrets
  data:
    - secretKey: database-url
      remoteRef:
        key: prod/api/database
        property: url

Network Policies

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-network-policy
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: ingress-nginx
      ports:
        - protocol: TCP
          port: 8080
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              name: database
      ports:
        - protocol: TCP
          port: 5432

Makefile Standards

All projects MUST include a Makefile with standardized commands for consistent developer experience.

Required Commands

Command Purpose Category
make build Build all components Core
make lint Run linters (golangci-lint) Code Quality
make test Run all tests Testing
make cover Generate test coverage report Testing
make test-unit Run unit tests only Testing
make up Start all services with Docker Compose Docker
make down Stop all services Docker
make start Start existing containers Docker
make stop Stop running containers Docker
make restart Restart all containers Docker
make rebuild-up Rebuild and restart services Docker
make set-env Copy .env.example to .env Setup
make generate-docs Generate API documentation (Swagger) Documentation

Component Delegation Pattern (Monorepo)

For monorepo projects with multiple components:

Command Purpose
make infra COMMAND=<cmd> Run command in infra component
make onboarding COMMAND=<cmd> Run command in onboarding component
make all-components COMMAND=<cmd> Run command across all components

Root Makefile Example

# Project Root Makefile

# Component directories
INFRA_DIR := ./components/infra
ONBOARDING_DIR := ./components/onboarding
TRANSACTION_DIR := ./components/transaction

COMPONENTS := $(INFRA_DIR) $(ONBOARDING_DIR) $(TRANSACTION_DIR)

# Docker command detection
DOCKER_CMD := $(shell if docker compose version >/dev/null 2>&1; then echo "docker compose"; else echo "docker-compose"; fi)

#-------------------------------------------------------
# Core Commands
#-------------------------------------------------------

.PHONY: build
build:
	@for dir in $(COMPONENTS); do \
		echo "Building in $$dir..."; \
		(cd $$dir && $(MAKE) build) || exit 1; \
	done
	@echo "[ok] All components built successfully"

.PHONY: test
test:
	@for dir in $(COMPONENTS); do \
		(cd $$dir && $(MAKE) test) || exit 1; \
	done

.PHONY: test-unit
test-unit:
	@for dir in $(COMPONENTS); do \
		(cd $$dir && go test -v -short ./...) || exit 1; \
	done

.PHONY: cover
cover:
	@sh ./scripts/coverage.sh
	@go tool cover -html=coverage.out -o coverage.html
	@echo "Coverage report generated at coverage.html"

#-------------------------------------------------------
# Code Quality Commands
#-------------------------------------------------------

.PHONY: lint
lint:
	@for dir in $(COMPONENTS); do \
		if find "$$dir" -name "*.go" -type f | grep -q .; then \
			(cd $$dir && golangci-lint run --fix ./...) || exit 1; \
		fi; \
	done
	@echo "[ok] Linting completed successfully"

#-------------------------------------------------------
# Docker Commands
#-------------------------------------------------------

.PHONY: up
up:
	@for dir in $(COMPONENTS); do \
		if [ -f "$$dir/docker-compose.yml" ]; then \
			(cd $$dir && $(DOCKER_CMD) -f docker-compose.yml up -d) || exit 1; \
		fi; \
	done
	@echo "[ok] All services started successfully"

.PHONY: down
down:
	@for dir in $(COMPONENTS); do \
		if [ -f "$$dir/docker-compose.yml" ]; then \
			(cd $$dir && $(DOCKER_CMD) -f docker-compose.yml down) || exit 1; \
		fi; \
	done

.PHONY: start
start:
	@for dir in $(COMPONENTS); do \
		if [ -f "$$dir/docker-compose.yml" ]; then \
			(cd $$dir && $(DOCKER_CMD) -f docker-compose.yml start) || exit 1; \
		fi; \
	done

.PHONY: stop
stop:
	@for dir in $(COMPONENTS); do \
		if [ -f "$$dir/docker-compose.yml" ]; then \
			(cd $$dir && $(DOCKER_CMD) -f docker-compose.yml stop) || exit 1; \
		fi; \
	done

.PHONY: restart
restart:
	@make stop && make start

.PHONY: rebuild-up
rebuild-up:
	@for dir in $(COMPONENTS); do \
		if [ -f "$$dir/docker-compose.yml" ]; then \
			(cd $$dir && $(DOCKER_CMD) -f docker-compose.yml down && \
			 $(DOCKER_CMD) -f docker-compose.yml build && \
			 $(DOCKER_CMD) -f docker-compose.yml up -d) || exit 1; \
		fi; \
	done

#-------------------------------------------------------
# Setup Commands
#-------------------------------------------------------

.PHONY: set-env
set-env:
	@for dir in $(COMPONENTS); do \
		if [ -f "$$dir/.env.example" ] && [ ! -f "$$dir/.env" ]; then \
			cp "$$dir/.env.example" "$$dir/.env"; \
			echo "Created .env in $$dir"; \
		fi; \
	done

#-------------------------------------------------------
# Documentation Commands
#-------------------------------------------------------

.PHONY: generate-docs
generate-docs:
	@./scripts/generate-docs.sh

#-------------------------------------------------------
# Component Delegation
#-------------------------------------------------------

.PHONY: infra
infra:
	@if [ -z "$(COMMAND)" ]; then \
		echo "Error: Use COMMAND=<cmd>"; exit 1; \
	fi
	@cd $(INFRA_DIR) && $(MAKE) $(COMMAND)

.PHONY: onboarding
onboarding:
	@if [ -z "$(COMMAND)" ]; then \
		echo "Error: Use COMMAND=<cmd>"; exit 1; \
	fi
	@cd $(ONBOARDING_DIR) && $(MAKE) $(COMMAND)

.PHONY: all-components
all-components:
	@if [ -z "$(COMMAND)" ]; then \
		echo "Error: Use COMMAND=<cmd>"; exit 1; \
	fi
	@for dir in $(COMPONENTS); do \
		(cd $$dir && $(MAKE) $(COMMAND)) || exit 1; \
	done

Component Makefile Example

# Component Makefile (e.g., components/onboarding/Makefile)

SERVICE_NAME := onboarding-service
ARTIFACTS_DIR := ./artifacts

.PHONY: build test lint up down

build:
	@go build -o $(ARTIFACTS_DIR)/$(SERVICE_NAME) ./cmd/app

test:
	@go test -v ./...

lint:
	@golangci-lint run --fix ./...

up:
	@docker compose -f docker-compose.yml up -d

down:
	@docker compose -f docker-compose.yml down

Checklist

Before deploying infrastructure, verify:

  • Terraform state stored remotely with locking
  • All resources tagged appropriately
  • Docker images use multi-stage builds
  • Secrets managed via External Secrets or similar
  • Monitoring dashboards and alerts configured