fleet/docs/infrastructure/monitoring-alerting.md
Zach Wasserman 6cbd10965c
Add dev infrastructure and docs for Prometheus monitoring (#33)
- Set up a simple example of Prometheus monitoring in the development
  docker-compose.yml.
- Add documentation for configuring Prometheus.
2020-11-12 19:06:56 -08:00

1.4 KiB

Monitoring Fleet

Health Checks

Fleet exposes a basic health check at the /healthz endpoint. This is the interface to use for simple monitoring and load-balancer health checks.

The /healthz endpoint will return an HTTP 200 status if the server is running and has healthy connections to MySQL and Redis. If there are any problems, the endpoint will return an HTTP 500 status.

Metrics

Fleet exposes server metrics in a format compatible with Prometheus. A simple example Prometheus configuration is available in tools/app/prometheus.yml.

Prometheus can be configured to use a wide range of service discovery mechanisms within AWS, GCP, Azure, Kubernetes, and more. See the Prometheus configuration documentation for more information on configuring the

Alerting

Prometheus has built-in support for alerting through Alertmanager.

Consider building alerts for

  • Changes from expected levels of host enrollment
  • Increased latency on HTTP endpoints
  • Increased error levels on HTTP endpoints
TODO (Seeking Contributors)
Add example alerting configurations

Graphing

Prometheus provides basic graphing capabilities, and integrates tightly with Grafana for sophisticated visualizations.