chore: CLAUDE.md refactor (#1437)

Inspiration: https://www.humanlayer.dev/blog/writing-a-good-claude-md?utm_source=tldrdev
2026-04-21 13:37:15 +00:00 · 2025-12-03 13:35:46 -05:00 · 2025-12-03 13:35:46 -05:00 · bd96c98cbf
commit bd96c98cbf
parent b7789cedb7
6 changed files with 310 additions and 313 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -1,336 +1,56 @@
-# HyperDX Claude Agent Guide
+# HyperDX Development Guide

-This guide helps Claude AI agents understand and work effectively with the
-HyperDX codebase.
+## What is HyperDX?

-## 🏗️ Project Overview
+HyperDX is an observability platform that helps engineers search, visualize, and monitor logs, metrics, traces, and session replays. It's built on ClickHouse for blazing-fast queries and supports OpenTelemetry natively.

-HyperDX is an observability platform built on ClickHouse that helps engineers
-search, visualize, and monitor logs, metrics, traces, and session replays. It's
-designed as an alternative to tools like Kibana but optimized for ClickHouse's
-performance characteristics.
+**Core value**: Unified observability with ClickHouse performance, schema-agnostic design, and correlation across all telemetry types in one place.

-**Core Value Proposition:**
+## Architecture (WHAT)

- Unified observability: correlate logs, metrics, traces, and session replays in
-  one place
- ClickHouse-powered: blazing fast searches and visualizations
- OpenTelemetry native: works out of the box with OTEL instrumentation
- Schema agnostic: works on top of existing ClickHouse schemas
+This is a **monorepo** with three main packages:

-## 📁 Architecture Overview
+- `packages/app` - Next.js frontend (TypeScript, Mantine UI, TanStack Query)
+- `packages/api` - Express backend (Node.js 22+, MongoDB for metadata, ClickHouse for telemetry)
+- `packages/common-utils` - Shared TypeScript utilities for query parsing and validation

-HyperDX follows a microservices architecture with clear separation between
-components:
+**Data flow**: Apps → OpenTelemetry Collector → ClickHouse (telemetry data) / MongoDB (configuration/metadata)

-### Core Services
-
- **HyperDX UI (`packages/app`)**: Next.js frontend serving the user interface
- **HyperDX API (`packages/api`)**: Node.js/Express backend handling queries and
-  business logic
- **OpenTelemetry Collector**: Receives and processes telemetry data
- **ClickHouse**: Primary data store for all telemetry (logs, metrics, traces)
- **MongoDB**: Metadata storage (users, dashboards, alerts, saved searches)
-
-### Data Flow
-
-1. Applications send telemetry via OpenTelemetry → OTel Collector
-2. OTel Collector processes and forwards data → ClickHouse
-3. Users interact with UI → API queries ClickHouse
-4. Configuration/metadata stored in MongoDB
-
-## 🛠️ Technology Stack
-
-### Frontend (`packages/app`)
-
- **Framework**: Next.js 14 with TypeScript
- **UI Components**: Mantine UI library
- **State Management**: Jotai for global state, TanStack Query for server state
- **Charts/Visualization**: Recharts, uPlot
- **Code Editor**: CodeMirror (for SQL/JSON editing)
- **Styling**: SCSS + CSS Modules
-
-### Backend (`packages/api`)
-
- **Runtime**: Node.js 22+ with TypeScript
- **Framework**: Express.js
- **Database**:
-  - ClickHouse (primary telemetry data)
-  - MongoDB (metadata via Mongoose)
- **Authentication**: Passport.js with local strategy
- **Validation**: Zod schemas
- **OpenTelemetry**: Self-instrumented with `@hyperdx/node-opentelemetry`
-
-### Common Utilities (`packages/common-utils`)
-
- Shared TypeScript utilities for query parsing, ClickHouse operations
- Zod schemas for data validation
- SQL formatting and query building helpers
-
-## 🏛️ Key Architectural Patterns
-
-### Database Models (MongoDB)
-
-All models follow consistent patterns with:
-
- Team-based multi-tenancy (most entities belong to a `team`)
- ObjectId references between related entities
- Timestamps for audit trails
- Zod schema validation
-
-**Key Models:**
-
- `Team`: Multi-tenant organization unit
- `User`: Team members with authentication
- `Source`: ClickHouse data source configuration
- `Connection`: Database connection settings
- `SavedSearch`: Saved queries and filters
- `Dashboard`: Custom dashboard configurations
- `Alert`: Monitoring alerts with thresholds
-
-### Frontend Architecture
-
- **Page-level components**: Located in `pages/` (Next.js routing)
- **Reusable components**: Located in `src/` directory
- **State management**:
-  - Server state via TanStack Query
-  - Client state via Jotai atoms
-  - URL state via query parameters
- **API communication**: Custom hooks wrapping TanStack Query
-
-### Backend Architecture
-
- **Router-based organization**: Separate routers for different API domains
- **Middleware stack**: Authentication, CORS, error handling
- **Controller pattern**: Business logic separated from route handlers
- **Service layer**: Reusable business logic (e.g., `agentService`)
-
-## 🔧 Development Environment
-
-### Setup Commands
+## Development Setup (HOW)

 ```bash
-# Install dependencies and setup hooks
-yarn setup
-
-# Start full development stack (Docker + local services)
-yarn dev
+yarn setup          # Install dependencies
+yarn dev            # Start full stack (Docker + local services)
 ```

-### Key Development Scripts
+The project uses **Yarn 4.5.1** workspaces. Docker Compose manages ClickHouse, MongoDB, and the OTel Collector.

- `yarn app:dev`: Start API, frontend, alerts task, and common-utils in watch
-  mode
- `yarn lint`: Run linting across all packages
- `yarn dev:int`: Run integration tests in watch mode
- `yarn dev:unit`: Run unit tests in watch mode (per package)
+## Working on the Codebase (HOW)

-### ⚠️ BEFORE COMMITTING - Run Linting Commands
+**Before starting a task**, read relevant documentation from the `agent_docs/` directory:

-**Claude AI agents must run these commands before any commit:**
+- `agent_docs/architecture.md` - Detailed architecture patterns and data models
+- `agent_docs/tech_stack.md` - Technology stack details and component patterns  
+- `agent_docs/development.md` - Development workflows, testing, and common tasks
+- `agent_docs/code_style.md` - Code patterns and best practices (read only when actively coding)

-```bash
-# 1. Fix linting issues in modified packages
-cd packages/app && yarn run lint:fix
-cd packages/api && yarn run lint:fix
-cd packages/common-utils && yarn lint:fix
+**Tools handle formatting and linting automatically** via pre-commit hooks. Focus on implementation; don't manually format code.

-# 2. Check for any remaining linting issues from the main directory
-yarn run lint
-```
+## Key Principles

-**If linting issues remain after running lint:fix**: Some linting errors cannot
-be automatically fixed and require manual intervention. If `yarn run lint` still
-shows errors:
+1. **Multi-tenancy**: All data is scoped to `Team` - ensure proper filtering
+2. **Type safety**: Use TypeScript strictly; Zod schemas for validation
+3. **Existing patterns**: Follow established patterns in the codebase - explore similar files before implementing
+4. **Component size**: Keep files under 300 lines; break down large components
+5. **Testing**: Tests live in `__tests__/` directories; use Jest for unit/integration tests

-1. Read the linting error messages carefully to understand the issue
-2. Manually fix the reported issues in the affected files
-3. Re-run `yarn run lint` to verify all issues are resolved
-4. Only commit once all linting errors are fixed
+## Important Context

-**Why this is necessary**: While the project has pre-commit hooks (`lint-staged`
-with Husky) that automatically fix linting issues on commit, Claude AI agents do
-not trigger these hooks. Therefore, you must manually run the lint:fix commands
-before committing.
-
-### Environment Configuration
-
- `.env.development`: Development environment variables
- Docker Compose manages ClickHouse, MongoDB, OTel Collector
- Hot reload enabled for all services in development
-
-## 📝 Code Style & Patterns
-
-### TypeScript Guidelines
-
- **Strict typing**: Avoid `any` type assertions (use proper typing instead)
- **Zod validation**: Use Zod schemas for runtime validation
- **Interface definitions**: Clear interfaces for all data structures
- **Error handling**: Proper error boundaries and serialization
-
-### Component Patterns
-
- **Functional components**: Use React hooks over class components
- **Custom hooks**: Extract reusable logic into custom hooks
- **Props interfaces**: Define clear TypeScript interfaces for component props
- **File organization**: Keep files under 300 lines, break down large components
-
-### UI Components & Styling
-
-**Prefer Mantine UI**: Use Mantine components as the primary UI library:
-
-```tsx
-// ✅ Good - Use Mantine components
-import { Button, TextInput, Modal, Select } from '@mantine/core';
-
-// ✅ Good - Mantine hooks for common functionality
-import { useDisclosure, useForm } from '@mantine/hooks';
-```
-
-**Component Hierarchy**:
-
-1. **First choice**: Mantine components (`@mantine/core`, `@mantine/dates`,
-   etc.)
-2. **Second choice**: Custom components built on Mantine primitives
-3. **Last resort**: Custom styling using CSS Modules and SCSS
-
-**Styling Approach**:
-
- Use Mantine's built-in styling system and theme
- SCSS modules for component-specific styles when needed
- Avoid inline styles unless absolutely necessary
- Leverage Mantine's responsive design utilities
-
-### API Patterns
-
- **RESTful design**: Clear HTTP methods and resource-based URLs
- **Middleware composition**: Reusable middleware for auth, validation, etc.
- **Error handling**: Consistent error response format
- **Input validation**: Zod schemas for request validation
-
-## 🧪 Testing Strategy
-
-### Testing Tools
-
- **Unit Tests**: Jest with TypeScript support
- **Integration Tests**: Jest with database fixtures
- **Frontend Testing**: React Testing Library + Jest
- **E2E Testing**: Custom smoke tests with BATS
-
-### Testing Patterns
-
- **TDD Approach**: Write tests before implementation for new features
- **Test organization**: Tests co-located with source files in `__tests__`
-  directories
- **Mocking**: MSW for API mocking in frontend tests
- **Database testing**: Isolated test databases with fixtures
-
-### CI Testing
-
-For integration testing in CI environments:
-
-```bash
-# Start CI testing stack (ClickHouse, MongoDB, etc.)
-docker compose -p int -f ./docker-compose.ci.yml up -d
-
-# Run integration tests
-yarn dev:int
-```
-
-**CI Testing Notes:**
-
- Uses separate Docker Compose configuration optimized for CI
- Isolated test environment with `-p int` project name
- Includes all necessary services (ClickHouse, MongoDB, OTel Collector)
- Tests run against real database instances for accurate integration testing
-
-## 🗄️ Data & Query Patterns
-
-### ClickHouse Integration
-
- **Query building**: Use `common-utils` for safe query construction
- **Schema flexibility**: Support for various telemetry schemas via `Source`
-  configuration
-
-### MongoDB Patterns
-
- **Multi-tenancy**: All queries filtered by team context
- **Relationships**: Use ObjectId references with proper population
- **Indexing**: Strategic indexes for query performance
- **Migrations**: Versioned migrations for schema changes
-
-## 🚀 Common Development Tasks
-
-### Adding New Features
-
-1. **API First**: Define API endpoints and data models
-2. **Database Models**: Create/update Mongoose schemas and ClickHouse queries
-3. **Frontend Integration**: Build UI components and integrate with API
-4. **Testing**: Add unit and integration tests
-5. **Documentation**: Update relevant docs
-
-### Performance Considerations
-
- **Frontend rendering**: Use virtualization for large datasets
- **API responses**: Implement pagination and caching where appropriate
- **Bundle size**: Monitor and optimize JavaScript bundle sizes
-
-## 🔍 Key Files & Directories
-
-### Configuration
-
- `packages/api/src/config.ts`: API configuration and environment variables
- `packages/app/next.config.js`: Next.js configuration
- `docker-compose.dev.yml`: Development environment setup
-
-### Core Business Logic
-
- `packages/api/src/models/`: MongoDB data models
- `packages/api/src/routers/`: API route definitions
- `packages/api/src/controllers/`: Business logic controllers
- `packages/common-utils/src/`: Shared utilities and query builders
-
-### Frontend Architecture
-
- `packages/app/pages/`: Next.js pages and routing
- `packages/app/src/`: Reusable components and utilities
- `packages/app/src/useUserPreferences.tsx`: Global user state management
-
-## 🚨 Common Pitfalls & Guidelines
-
-### Security
-
- **Server-side validation**: Always validate and sanitize on the backend
- **Team isolation**: Ensure proper team-based access control
- **API authentication**: Use proper authentication middleware
- **Environment variables**: Never commit secrets, use `.env` files
-
-### Performance
-
- **React rendering**: Use proper keys and memoization for large lists
- **API pagination**: Implement cursor-based pagination for large datasets
-
-### Code Quality
-
- **Component responsibility**: Single responsibility principle
- **Error boundaries**: Proper error handling at component boundaries
- **Type safety**: Prefer type-safe approaches over runtime checks
-
-## 🔗 Useful Resources
-
- **OpenTelemetry Docs**: Understanding telemetry data structures
- **ClickHouse Docs**: Query optimization and schema design
- **Mantine UI**: Component library documentation
- **TanStack Query**: Server state management patterns
-
-## 🤝 Contributing Guidelines
-
-1. **Follow existing patterns**: Maintain consistency with current codebase
-2. **Test coverage**: Add tests for new functionality
-3. **Documentation**: Update relevant documentation
-4. **Code review**: Ensure changes align with architectural principles
-5. **Performance impact**: Consider impact on query performance and bundle size
+- **Authentication**: Passport.js with team-based access control
+- **State management**: Jotai (client), TanStack Query (server), URL params (filters)
+- **UI library**: Mantine components are the standard (not custom UI)
+- **Database patterns**: MongoDB for metadata with Mongoose, ClickHouse for telemetry queries

 ---

-_This guide should be updated as the codebase evolves and new patterns emerge._
+*Need more details? Check the `agent_docs/` directory or ask which documentation to read.*
--- a/agent_docs/README.md
+++ b/agent_docs/README.md
@ -0,0 +1,34 @@
+# Agent Documentation Directory
+
+This directory contains detailed documentation for AI coding agents working on the HyperDX codebase. These files use **progressive disclosure** - they're referenced from `CLAUDE.md` but only read when needed.
+
+## Purpose
+
+Instead of stuffing all instructions into `CLAUDE.md` (which goes into every conversation), we keep detailed, task-specific information here. This ensures:
+
+1. **Better focus**: Only relevant context gets loaded per task
+2. **Improved performance**: Smaller context window = better instruction following
+3. **Easier maintenance**: Update specific docs without bloating the main file
+
+## Files
+
+- **`architecture.md`** - System architecture, data models, service relationships, security patterns
+- **`tech_stack.md`** - Technology choices, UI component patterns, library usage
+- **`development.md`** - Development workflows, testing strategy, common tasks, debugging
+- **`code_style.md`** - Code patterns and best practices (read only when actively coding)
+
+## Usage Pattern
+
+When starting a task:
+1. Agent reads `CLAUDE.md` first (always included)
+2. Agent determines which (if any) docs from this directory are relevant
+3. Agent reads only the needed documentation
+4. Agent proceeds with focused, relevant context
+
+## Maintenance
+
+- Keep files focused on their specific domain
+- Use file/line references instead of code snippets when possible
+- Update when patterns or architecture change
+- Keep documentation current with the codebase
+
--- a/agent_docs/architecture.md
+++ b/agent_docs/architecture.md
@ -0,0 +1,67 @@
+# HyperDX Architecture
+
+## Core Services
+
+- **HyperDX UI (`packages/app`)**: Next.js frontend serving the user interface
+- **HyperDX API (`packages/api`)**: Node.js/Express backend handling queries and business logic
+- **OpenTelemetry Collector**: Receives and processes telemetry data
+- **ClickHouse**: Primary data store for all telemetry (logs, metrics, traces)
+- **MongoDB**: Metadata storage (users, dashboards, alerts, saved searches)
+
+## Data Flow
+
+1. Applications send telemetry via OpenTelemetry → OTel Collector
+2. OTel Collector processes and forwards data → ClickHouse
+3. Users interact with UI → API queries ClickHouse
+4. Configuration/metadata stored in MongoDB
+
+## Key MongoDB Models
+
+All models follow consistent patterns with:
+- Team-based multi-tenancy (most entities belong to a `team`)
+- ObjectId references between related entities
+- Timestamps for audit trails
+- Zod schema validation
+
+**Key Models** (see `packages/api/src/models/`):
+- `Team`: Multi-tenant organization unit
+- `User`: Team members with authentication
+- `Source`: ClickHouse data source configuration
+- `Connection`: Database connection settings
+- `SavedSearch`: Saved queries and filters
+- `Dashboard`: Custom dashboard configurations
+- `Alert`: Monitoring alerts with thresholds
+
+## Frontend Architecture
+
+- **Pages**: `packages/app/pages/` (Next.js routing)
+- **Components**: `packages/app/src/` (reusable components)
+- **API communication**: Custom hooks wrapping TanStack Query
+- **State**: See tech_stack.md for state management details
+
+## Backend Architecture
+
+- **Routers**: `packages/api/src/routers/` - Domain-specific API routes
+- **Controllers**: `packages/api/src/controllers/` - Business logic separated from routes
+- **Middleware**: Authentication, CORS, error handling
+- **Services**: Reusable business logic (e.g., `agentService`)
+
+## Data & Query Patterns
+
+### ClickHouse Integration
+- **Query building**: Use `common-utils` for safe query construction
+- **Schema flexibility**: Support for various telemetry schemas via `Source` configuration
+
+### MongoDB Patterns
+- **Multi-tenancy**: All queries filtered by team context
+- **Relationships**: Use ObjectId references with proper population
+- **Indexing**: Strategic indexes for query performance
+- **Migrations**: Versioned migrations for schema changes (see `packages/api/migrations/`)
+
+## Security Requirements
+
+- **Server-side validation**: Always validate and sanitize on the backend
+- **Team isolation**: All data access must filter by team context
+- **API authentication**: Use authentication middleware on protected routes
+- **Secrets**: Never commit secrets; use `.env` files
+
--- a/agent_docs/code_style.md
+++ b/agent_docs/code_style.md
@ -0,0 +1,37 @@
+# Code Style & Best Practices
+
+> **Note**: Pre-commit hooks handle formatting automatically. Focus on implementation patterns.
+
+## TypeScript
+
+- Avoid `any` - use proper typing
+- Use Zod schemas for runtime validation
+- Define clear interfaces for data structures
+- Implement proper error boundaries
+
+## Code Organization
+
+- **Single Responsibility**: One clear purpose per component/function
+- **File Size**: Max 300 lines - refactor when approaching limit
+- **DRY**: Reuse existing functionality; consolidate duplicates
+- **In-Context Learning**: Explore similar files before implementing
+
+## React Patterns
+
+- Functional components with hooks (not class components)
+- Extract reusable logic into custom hooks
+- Define TypeScript interfaces for props
+- Use proper keys for lists, memoization for expensive computations
+
+## Refactoring
+
+- Edit files directly - don't create `component-v2.tsx` copies
+- Look for duplicate code across the affected area
+- Verify all callers and integrations after changes
+- Refactor to improve clarity or reduce complexity, not just to change
+
+## File Naming
+
+- Clear, descriptive names following package conventions
+- Avoid "temp", "refactored", "improved" in permanent filenames
+
--- a/agent_docs/development.md
+++ b/agent_docs/development.md
@ -0,0 +1,111 @@
+# Development Workflows
+
+## Setup Commands
+
+```bash
+# Install dependencies and setup hooks
+yarn setup
+
+# Start full development stack (Docker + local services)
+yarn dev
+```
+
+## Key Development Scripts
+
+- `yarn app:dev`: Start API, frontend, alerts task, and common-utils in watch mode
+- `yarn lint`: Run linting across all packages
+- `yarn dev:int`: Run integration tests in watch mode
+- `yarn dev:unit`: Run unit tests in watch mode (per package)
+
+## Environment Configuration
+
+- `.env.development`: Development environment variables
+- Docker Compose manages ClickHouse, MongoDB, OTel Collector
+- Hot reload enabled for all services in development
+
+## Testing Strategy
+
+### Testing Tools
+
+- **Unit Tests**: Jest with TypeScript support
+- **Integration Tests**: Jest with database fixtures
+- **Frontend Testing**: React Testing Library + Jest
+- **E2E Testing**: Custom smoke tests with BATS
+
+### Testing Patterns
+
+- **TDD Approach**: Write tests before implementation for new features
+- **Test organization**: Tests co-located with source files in `__tests__/` directories
+- **Mocking**: MSW for API mocking in frontend tests
+- **Database testing**: Isolated test databases with fixtures
+
+### CI Testing
+
+For integration testing in CI environments:
+
+```bash
+# Start CI testing stack (ClickHouse, MongoDB, etc.)
+docker compose -p int -f ./docker-compose.ci.yml up -d
+
+# Run integration tests
+yarn dev:int
+```
+
+**CI Testing Notes:**
+- Uses separate Docker Compose configuration optimized for CI
+- Isolated test environment with `-p int` project name
+- Includes all necessary services (ClickHouse, MongoDB, OTel Collector)
+- Tests run against real database instances for accurate integration testing
+
+## Common Development Tasks
+
+### Adding New Features
+
+1. **API First**: Define API endpoints and data models
+2. **Database Models**: Create/update Mongoose schemas and ClickHouse queries
+3. **Frontend Integration**: Build UI components and integrate with API
+4. **Testing**: Add unit and integration tests
+5. **Documentation**: Update relevant docs
+
+### Debugging
+
+- Check browser and server console output for errors, warnings, or relevant logs
+- Add targeted logging to trace execution and variable states
+- For persistent issues, check `fixes/` directory for documented solutions
+- Document complex fixes in `fixes/` directory with descriptive filenames
+
+## Code Quality
+
+### Pre-commit Hooks
+
+The project uses Husky + lint-staged to automatically run:
+- Prettier for formatting
+- ESLint for linting
+- API doc generation (for external API changes)
+
+These run automatically on `git commit` for staged files.
+
+### Manual Linting (if needed)
+
+If you need to manually lint:
+
+```bash
+# Per-package linting with auto-fix
+cd packages/app && yarn run lint:fix
+cd packages/api && yarn run lint:fix
+cd packages/common-utils && yarn lint:fix
+
+# Check all packages
+yarn run lint
+```
+
+## File Locations Quick Reference
+
+- **Config**: `packages/api/src/config.ts`, `packages/app/next.config.js`, `docker-compose.dev.yml`
+- **Models**: `packages/api/src/models/`
+- **API Routes**: `packages/api/src/routers/`
+- **Controllers**: `packages/api/src/controllers/`
+- **Pages**: `packages/app/pages/`
+- **Components**: `packages/app/src/`
+- **Shared Utils**: `packages/common-utils/src/`
+
--- a/agent_docs/tech_stack.md
+++ b/agent_docs/tech_stack.md
@ -0,0 +1,28 @@
+# HyperDX Technology Stack
+
+## Frontend (`packages/app`)
+
+- **Framework**: Next.js 14 with TypeScript
+- **UI Components**: Mantine UI library (`@mantine/core`, `@mantine/dates`, `@mantine/hooks`)
+- **State Management**: Jotai (global client state), TanStack Query (server state), URL params (filters)
+- **Charts/Visualization**: Recharts, uPlot
+- **Code Editor**: CodeMirror (for SQL/JSON editing)
+- **Styling**: Mantine's built-in system, SCSS modules when needed
+
+**UI Component Priority**: Mantine components first → Custom components on Mantine primitives → Custom SCSS modules as last resort
+
+## Backend (`packages/api`)
+
+- **Runtime**: Node.js 22+ with TypeScript
+- **Framework**: Express.js
+- **Database**: ClickHouse (telemetry data), MongoDB via Mongoose (metadata)
+- **Authentication**: Passport.js with local strategy
+- **Validation**: Zod schemas
+- **Telemetry**: Self-instrumented with `@hyperdx/node-opentelemetry`
+
+## Common Utilities (`packages/common-utils`)
+
+- Shared TypeScript utilities for query parsing and ClickHouse operations
+- Zod schemas for data validation
+- SQL formatting and query building helpers
+