From bd96c98cbf02425cf7610b6d0bac814818c6ad16 Mon Sep 17 00:00:00 2001 From: Tom Alexander Date: Wed, 3 Dec 2025 13:35:46 -0500 Subject: [PATCH] chore: CLAUDE.md refactor (#1437) Inspiration: https://www.humanlayer.dev/blog/writing-a-good-claude-md?utm_source=tldrdev --- CLAUDE.md | 346 ++++--------------------------------- agent_docs/README.md | 34 ++++ agent_docs/architecture.md | 67 +++++++ agent_docs/code_style.md | 37 ++++ agent_docs/development.md | 111 ++++++++++++ agent_docs/tech_stack.md | 28 +++ 6 files changed, 310 insertions(+), 313 deletions(-) create mode 100644 agent_docs/README.md create mode 100644 agent_docs/architecture.md create mode 100644 agent_docs/code_style.md create mode 100644 agent_docs/development.md create mode 100644 agent_docs/tech_stack.md diff --git a/CLAUDE.md b/CLAUDE.md index ca617658..45ec90f4 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,336 +1,56 @@ -# HyperDX Claude Agent Guide +# HyperDX Development Guide -This guide helps Claude AI agents understand and work effectively with the -HyperDX codebase. +## What is HyperDX? -## ๐Ÿ—๏ธ Project Overview +HyperDX is an observability platform that helps engineers search, visualize, and monitor logs, metrics, traces, and session replays. It's built on ClickHouse for blazing-fast queries and supports OpenTelemetry natively. -HyperDX is an observability platform built on ClickHouse that helps engineers -search, visualize, and monitor logs, metrics, traces, and session replays. It's -designed as an alternative to tools like Kibana but optimized for ClickHouse's -performance characteristics. +**Core value**: Unified observability with ClickHouse performance, schema-agnostic design, and correlation across all telemetry types in one place. -**Core Value Proposition:** +## Architecture (WHAT) -- Unified observability: correlate logs, metrics, traces, and session replays in - one place -- ClickHouse-powered: blazing fast searches and visualizations -- OpenTelemetry native: works out of the box with OTEL instrumentation -- Schema agnostic: works on top of existing ClickHouse schemas +This is a **monorepo** with three main packages: -## ๐Ÿ“ Architecture Overview +- `packages/app` - Next.js frontend (TypeScript, Mantine UI, TanStack Query) +- `packages/api` - Express backend (Node.js 22+, MongoDB for metadata, ClickHouse for telemetry) +- `packages/common-utils` - Shared TypeScript utilities for query parsing and validation -HyperDX follows a microservices architecture with clear separation between -components: +**Data flow**: Apps โ†’ OpenTelemetry Collector โ†’ ClickHouse (telemetry data) / MongoDB (configuration/metadata) -### Core Services - -- **HyperDX UI (`packages/app`)**: Next.js frontend serving the user interface -- **HyperDX API (`packages/api`)**: Node.js/Express backend handling queries and - business logic -- **OpenTelemetry Collector**: Receives and processes telemetry data -- **ClickHouse**: Primary data store for all telemetry (logs, metrics, traces) -- **MongoDB**: Metadata storage (users, dashboards, alerts, saved searches) - -### Data Flow - -1. Applications send telemetry via OpenTelemetry โ†’ OTel Collector -2. OTel Collector processes and forwards data โ†’ ClickHouse -3. Users interact with UI โ†’ API queries ClickHouse -4. Configuration/metadata stored in MongoDB - -## ๐Ÿ› ๏ธ Technology Stack - -### Frontend (`packages/app`) - -- **Framework**: Next.js 14 with TypeScript -- **UI Components**: Mantine UI library -- **State Management**: Jotai for global state, TanStack Query for server state -- **Charts/Visualization**: Recharts, uPlot -- **Code Editor**: CodeMirror (for SQL/JSON editing) -- **Styling**: SCSS + CSS Modules - -### Backend (`packages/api`) - -- **Runtime**: Node.js 22+ with TypeScript -- **Framework**: Express.js -- **Database**: - - ClickHouse (primary telemetry data) - - MongoDB (metadata via Mongoose) -- **Authentication**: Passport.js with local strategy -- **Validation**: Zod schemas -- **OpenTelemetry**: Self-instrumented with `@hyperdx/node-opentelemetry` - -### Common Utilities (`packages/common-utils`) - -- Shared TypeScript utilities for query parsing, ClickHouse operations -- Zod schemas for data validation -- SQL formatting and query building helpers - -## ๐Ÿ›๏ธ Key Architectural Patterns - -### Database Models (MongoDB) - -All models follow consistent patterns with: - -- Team-based multi-tenancy (most entities belong to a `team`) -- ObjectId references between related entities -- Timestamps for audit trails -- Zod schema validation - -**Key Models:** - -- `Team`: Multi-tenant organization unit -- `User`: Team members with authentication -- `Source`: ClickHouse data source configuration -- `Connection`: Database connection settings -- `SavedSearch`: Saved queries and filters -- `Dashboard`: Custom dashboard configurations -- `Alert`: Monitoring alerts with thresholds - -### Frontend Architecture - -- **Page-level components**: Located in `pages/` (Next.js routing) -- **Reusable components**: Located in `src/` directory -- **State management**: - - Server state via TanStack Query - - Client state via Jotai atoms - - URL state via query parameters -- **API communication**: Custom hooks wrapping TanStack Query - -### Backend Architecture - -- **Router-based organization**: Separate routers for different API domains -- **Middleware stack**: Authentication, CORS, error handling -- **Controller pattern**: Business logic separated from route handlers -- **Service layer**: Reusable business logic (e.g., `agentService`) - -## ๐Ÿ”ง Development Environment - -### Setup Commands +## Development Setup (HOW) ```bash -# Install dependencies and setup hooks -yarn setup - -# Start full development stack (Docker + local services) -yarn dev +yarn setup # Install dependencies +yarn dev # Start full stack (Docker + local services) ``` -### Key Development Scripts +The project uses **Yarn 4.5.1** workspaces. Docker Compose manages ClickHouse, MongoDB, and the OTel Collector. -- `yarn app:dev`: Start API, frontend, alerts task, and common-utils in watch - mode -- `yarn lint`: Run linting across all packages -- `yarn dev:int`: Run integration tests in watch mode -- `yarn dev:unit`: Run unit tests in watch mode (per package) +## Working on the Codebase (HOW) -### โš ๏ธ BEFORE COMMITTING - Run Linting Commands +**Before starting a task**, read relevant documentation from the `agent_docs/` directory: -**Claude AI agents must run these commands before any commit:** +- `agent_docs/architecture.md` - Detailed architecture patterns and data models +- `agent_docs/tech_stack.md` - Technology stack details and component patterns +- `agent_docs/development.md` - Development workflows, testing, and common tasks +- `agent_docs/code_style.md` - Code patterns and best practices (read only when actively coding) -```bash -# 1. Fix linting issues in modified packages -cd packages/app && yarn run lint:fix -cd packages/api && yarn run lint:fix -cd packages/common-utils && yarn lint:fix +**Tools handle formatting and linting automatically** via pre-commit hooks. Focus on implementation; don't manually format code. -# 2. Check for any remaining linting issues from the main directory -yarn run lint -``` +## Key Principles -**If linting issues remain after running lint:fix**: Some linting errors cannot -be automatically fixed and require manual intervention. If `yarn run lint` still -shows errors: +1. **Multi-tenancy**: All data is scoped to `Team` - ensure proper filtering +2. **Type safety**: Use TypeScript strictly; Zod schemas for validation +3. **Existing patterns**: Follow established patterns in the codebase - explore similar files before implementing +4. **Component size**: Keep files under 300 lines; break down large components +5. **Testing**: Tests live in `__tests__/` directories; use Jest for unit/integration tests -1. Read the linting error messages carefully to understand the issue -2. Manually fix the reported issues in the affected files -3. Re-run `yarn run lint` to verify all issues are resolved -4. Only commit once all linting errors are fixed +## Important Context -**Why this is necessary**: While the project has pre-commit hooks (`lint-staged` -with Husky) that automatically fix linting issues on commit, Claude AI agents do -not trigger these hooks. Therefore, you must manually run the lint:fix commands -before committing. - -### Environment Configuration - -- `.env.development`: Development environment variables -- Docker Compose manages ClickHouse, MongoDB, OTel Collector -- Hot reload enabled for all services in development - -## ๐Ÿ“ Code Style & Patterns - -### TypeScript Guidelines - -- **Strict typing**: Avoid `any` type assertions (use proper typing instead) -- **Zod validation**: Use Zod schemas for runtime validation -- **Interface definitions**: Clear interfaces for all data structures -- **Error handling**: Proper error boundaries and serialization - -### Component Patterns - -- **Functional components**: Use React hooks over class components -- **Custom hooks**: Extract reusable logic into custom hooks -- **Props interfaces**: Define clear TypeScript interfaces for component props -- **File organization**: Keep files under 300 lines, break down large components - -### UI Components & Styling - -**Prefer Mantine UI**: Use Mantine components as the primary UI library: - -```tsx -// โœ… Good - Use Mantine components -import { Button, TextInput, Modal, Select } from '@mantine/core'; - -// โœ… Good - Mantine hooks for common functionality -import { useDisclosure, useForm } from '@mantine/hooks'; -``` - -**Component Hierarchy**: - -1. **First choice**: Mantine components (`@mantine/core`, `@mantine/dates`, - etc.) -2. **Second choice**: Custom components built on Mantine primitives -3. **Last resort**: Custom styling using CSS Modules and SCSS - -**Styling Approach**: - -- Use Mantine's built-in styling system and theme -- SCSS modules for component-specific styles when needed -- Avoid inline styles unless absolutely necessary -- Leverage Mantine's responsive design utilities - -### API Patterns - -- **RESTful design**: Clear HTTP methods and resource-based URLs -- **Middleware composition**: Reusable middleware for auth, validation, etc. -- **Error handling**: Consistent error response format -- **Input validation**: Zod schemas for request validation - -## ๐Ÿงช Testing Strategy - -### Testing Tools - -- **Unit Tests**: Jest with TypeScript support -- **Integration Tests**: Jest with database fixtures -- **Frontend Testing**: React Testing Library + Jest -- **E2E Testing**: Custom smoke tests with BATS - -### Testing Patterns - -- **TDD Approach**: Write tests before implementation for new features -- **Test organization**: Tests co-located with source files in `__tests__` - directories -- **Mocking**: MSW for API mocking in frontend tests -- **Database testing**: Isolated test databases with fixtures - -### CI Testing - -For integration testing in CI environments: - -```bash -# Start CI testing stack (ClickHouse, MongoDB, etc.) -docker compose -p int -f ./docker-compose.ci.yml up -d - -# Run integration tests -yarn dev:int -``` - -**CI Testing Notes:** - -- Uses separate Docker Compose configuration optimized for CI -- Isolated test environment with `-p int` project name -- Includes all necessary services (ClickHouse, MongoDB, OTel Collector) -- Tests run against real database instances for accurate integration testing - -## ๐Ÿ—„๏ธ Data & Query Patterns - -### ClickHouse Integration - -- **Query building**: Use `common-utils` for safe query construction -- **Schema flexibility**: Support for various telemetry schemas via `Source` - configuration - -### MongoDB Patterns - -- **Multi-tenancy**: All queries filtered by team context -- **Relationships**: Use ObjectId references with proper population -- **Indexing**: Strategic indexes for query performance -- **Migrations**: Versioned migrations for schema changes - -## ๐Ÿš€ Common Development Tasks - -### Adding New Features - -1. **API First**: Define API endpoints and data models -2. **Database Models**: Create/update Mongoose schemas and ClickHouse queries -3. **Frontend Integration**: Build UI components and integrate with API -4. **Testing**: Add unit and integration tests -5. **Documentation**: Update relevant docs - -### Performance Considerations - -- **Frontend rendering**: Use virtualization for large datasets -- **API responses**: Implement pagination and caching where appropriate -- **Bundle size**: Monitor and optimize JavaScript bundle sizes - -## ๐Ÿ” Key Files & Directories - -### Configuration - -- `packages/api/src/config.ts`: API configuration and environment variables -- `packages/app/next.config.js`: Next.js configuration -- `docker-compose.dev.yml`: Development environment setup - -### Core Business Logic - -- `packages/api/src/models/`: MongoDB data models -- `packages/api/src/routers/`: API route definitions -- `packages/api/src/controllers/`: Business logic controllers -- `packages/common-utils/src/`: Shared utilities and query builders - -### Frontend Architecture - -- `packages/app/pages/`: Next.js pages and routing -- `packages/app/src/`: Reusable components and utilities -- `packages/app/src/useUserPreferences.tsx`: Global user state management - -## ๐Ÿšจ Common Pitfalls & Guidelines - -### Security - -- **Server-side validation**: Always validate and sanitize on the backend -- **Team isolation**: Ensure proper team-based access control -- **API authentication**: Use proper authentication middleware -- **Environment variables**: Never commit secrets, use `.env` files - -### Performance - -- **React rendering**: Use proper keys and memoization for large lists -- **API pagination**: Implement cursor-based pagination for large datasets - -### Code Quality - -- **Component responsibility**: Single responsibility principle -- **Error boundaries**: Proper error handling at component boundaries -- **Type safety**: Prefer type-safe approaches over runtime checks - -## ๐Ÿ”— Useful Resources - -- **OpenTelemetry Docs**: Understanding telemetry data structures -- **ClickHouse Docs**: Query optimization and schema design -- **Mantine UI**: Component library documentation -- **TanStack Query**: Server state management patterns - -## ๐Ÿค Contributing Guidelines - -1. **Follow existing patterns**: Maintain consistency with current codebase -2. **Test coverage**: Add tests for new functionality -3. **Documentation**: Update relevant documentation -4. **Code review**: Ensure changes align with architectural principles -5. **Performance impact**: Consider impact on query performance and bundle size +- **Authentication**: Passport.js with team-based access control +- **State management**: Jotai (client), TanStack Query (server), URL params (filters) +- **UI library**: Mantine components are the standard (not custom UI) +- **Database patterns**: MongoDB for metadata with Mongoose, ClickHouse for telemetry queries --- -_This guide should be updated as the codebase evolves and new patterns emerge._ +*Need more details? Check the `agent_docs/` directory or ask which documentation to read.* diff --git a/agent_docs/README.md b/agent_docs/README.md new file mode 100644 index 00000000..f31316be --- /dev/null +++ b/agent_docs/README.md @@ -0,0 +1,34 @@ +# Agent Documentation Directory + +This directory contains detailed documentation for AI coding agents working on the HyperDX codebase. These files use **progressive disclosure** - they're referenced from `CLAUDE.md` but only read when needed. + +## Purpose + +Instead of stuffing all instructions into `CLAUDE.md` (which goes into every conversation), we keep detailed, task-specific information here. This ensures: + +1. **Better focus**: Only relevant context gets loaded per task +2. **Improved performance**: Smaller context window = better instruction following +3. **Easier maintenance**: Update specific docs without bloating the main file + +## Files + +- **`architecture.md`** - System architecture, data models, service relationships, security patterns +- **`tech_stack.md`** - Technology choices, UI component patterns, library usage +- **`development.md`** - Development workflows, testing strategy, common tasks, debugging +- **`code_style.md`** - Code patterns and best practices (read only when actively coding) + +## Usage Pattern + +When starting a task: +1. Agent reads `CLAUDE.md` first (always included) +2. Agent determines which (if any) docs from this directory are relevant +3. Agent reads only the needed documentation +4. Agent proceeds with focused, relevant context + +## Maintenance + +- Keep files focused on their specific domain +- Use file/line references instead of code snippets when possible +- Update when patterns or architecture change +- Keep documentation current with the codebase + diff --git a/agent_docs/architecture.md b/agent_docs/architecture.md new file mode 100644 index 00000000..503943de --- /dev/null +++ b/agent_docs/architecture.md @@ -0,0 +1,67 @@ +# HyperDX Architecture + +## Core Services + +- **HyperDX UI (`packages/app`)**: Next.js frontend serving the user interface +- **HyperDX API (`packages/api`)**: Node.js/Express backend handling queries and business logic +- **OpenTelemetry Collector**: Receives and processes telemetry data +- **ClickHouse**: Primary data store for all telemetry (logs, metrics, traces) +- **MongoDB**: Metadata storage (users, dashboards, alerts, saved searches) + +## Data Flow + +1. Applications send telemetry via OpenTelemetry โ†’ OTel Collector +2. OTel Collector processes and forwards data โ†’ ClickHouse +3. Users interact with UI โ†’ API queries ClickHouse +4. Configuration/metadata stored in MongoDB + +## Key MongoDB Models + +All models follow consistent patterns with: +- Team-based multi-tenancy (most entities belong to a `team`) +- ObjectId references between related entities +- Timestamps for audit trails +- Zod schema validation + +**Key Models** (see `packages/api/src/models/`): +- `Team`: Multi-tenant organization unit +- `User`: Team members with authentication +- `Source`: ClickHouse data source configuration +- `Connection`: Database connection settings +- `SavedSearch`: Saved queries and filters +- `Dashboard`: Custom dashboard configurations +- `Alert`: Monitoring alerts with thresholds + +## Frontend Architecture + +- **Pages**: `packages/app/pages/` (Next.js routing) +- **Components**: `packages/app/src/` (reusable components) +- **API communication**: Custom hooks wrapping TanStack Query +- **State**: See tech_stack.md for state management details + +## Backend Architecture + +- **Routers**: `packages/api/src/routers/` - Domain-specific API routes +- **Controllers**: `packages/api/src/controllers/` - Business logic separated from routes +- **Middleware**: Authentication, CORS, error handling +- **Services**: Reusable business logic (e.g., `agentService`) + +## Data & Query Patterns + +### ClickHouse Integration +- **Query building**: Use `common-utils` for safe query construction +- **Schema flexibility**: Support for various telemetry schemas via `Source` configuration + +### MongoDB Patterns +- **Multi-tenancy**: All queries filtered by team context +- **Relationships**: Use ObjectId references with proper population +- **Indexing**: Strategic indexes for query performance +- **Migrations**: Versioned migrations for schema changes (see `packages/api/migrations/`) + +## Security Requirements + +- **Server-side validation**: Always validate and sanitize on the backend +- **Team isolation**: All data access must filter by team context +- **API authentication**: Use authentication middleware on protected routes +- **Secrets**: Never commit secrets; use `.env` files + diff --git a/agent_docs/code_style.md b/agent_docs/code_style.md new file mode 100644 index 00000000..43bc9cac --- /dev/null +++ b/agent_docs/code_style.md @@ -0,0 +1,37 @@ +# Code Style & Best Practices + +> **Note**: Pre-commit hooks handle formatting automatically. Focus on implementation patterns. + +## TypeScript + +- Avoid `any` - use proper typing +- Use Zod schemas for runtime validation +- Define clear interfaces for data structures +- Implement proper error boundaries + +## Code Organization + +- **Single Responsibility**: One clear purpose per component/function +- **File Size**: Max 300 lines - refactor when approaching limit +- **DRY**: Reuse existing functionality; consolidate duplicates +- **In-Context Learning**: Explore similar files before implementing + +## React Patterns + +- Functional components with hooks (not class components) +- Extract reusable logic into custom hooks +- Define TypeScript interfaces for props +- Use proper keys for lists, memoization for expensive computations + +## Refactoring + +- Edit files directly - don't create `component-v2.tsx` copies +- Look for duplicate code across the affected area +- Verify all callers and integrations after changes +- Refactor to improve clarity or reduce complexity, not just to change + +## File Naming + +- Clear, descriptive names following package conventions +- Avoid "temp", "refactored", "improved" in permanent filenames + diff --git a/agent_docs/development.md b/agent_docs/development.md new file mode 100644 index 00000000..4f2a61cf --- /dev/null +++ b/agent_docs/development.md @@ -0,0 +1,111 @@ +# Development Workflows + +## Setup Commands + +```bash +# Install dependencies and setup hooks +yarn setup + +# Start full development stack (Docker + local services) +yarn dev +``` + +## Key Development Scripts + +- `yarn app:dev`: Start API, frontend, alerts task, and common-utils in watch mode +- `yarn lint`: Run linting across all packages +- `yarn dev:int`: Run integration tests in watch mode +- `yarn dev:unit`: Run unit tests in watch mode (per package) + +## Environment Configuration + +- `.env.development`: Development environment variables +- Docker Compose manages ClickHouse, MongoDB, OTel Collector +- Hot reload enabled for all services in development + +## Testing Strategy + +### Testing Tools + +- **Unit Tests**: Jest with TypeScript support +- **Integration Tests**: Jest with database fixtures +- **Frontend Testing**: React Testing Library + Jest +- **E2E Testing**: Custom smoke tests with BATS + +### Testing Patterns + +- **TDD Approach**: Write tests before implementation for new features +- **Test organization**: Tests co-located with source files in `__tests__/` directories +- **Mocking**: MSW for API mocking in frontend tests +- **Database testing**: Isolated test databases with fixtures + +### CI Testing + +For integration testing in CI environments: + +```bash +# Start CI testing stack (ClickHouse, MongoDB, etc.) +docker compose -p int -f ./docker-compose.ci.yml up -d + +# Run integration tests +yarn dev:int +``` + +**CI Testing Notes:** +- Uses separate Docker Compose configuration optimized for CI +- Isolated test environment with `-p int` project name +- Includes all necessary services (ClickHouse, MongoDB, OTel Collector) +- Tests run against real database instances for accurate integration testing + +## Common Development Tasks + +### Adding New Features + +1. **API First**: Define API endpoints and data models +2. **Database Models**: Create/update Mongoose schemas and ClickHouse queries +3. **Frontend Integration**: Build UI components and integrate with API +4. **Testing**: Add unit and integration tests +5. **Documentation**: Update relevant docs + +### Debugging + +- Check browser and server console output for errors, warnings, or relevant logs +- Add targeted logging to trace execution and variable states +- For persistent issues, check `fixes/` directory for documented solutions +- Document complex fixes in `fixes/` directory with descriptive filenames + +## Code Quality + +### Pre-commit Hooks + +The project uses Husky + lint-staged to automatically run: +- Prettier for formatting +- ESLint for linting +- API doc generation (for external API changes) + +These run automatically on `git commit` for staged files. + +### Manual Linting (if needed) + +If you need to manually lint: + +```bash +# Per-package linting with auto-fix +cd packages/app && yarn run lint:fix +cd packages/api && yarn run lint:fix +cd packages/common-utils && yarn lint:fix + +# Check all packages +yarn run lint +``` + +## File Locations Quick Reference + +- **Config**: `packages/api/src/config.ts`, `packages/app/next.config.js`, `docker-compose.dev.yml` +- **Models**: `packages/api/src/models/` +- **API Routes**: `packages/api/src/routers/` +- **Controllers**: `packages/api/src/controllers/` +- **Pages**: `packages/app/pages/` +- **Components**: `packages/app/src/` +- **Shared Utils**: `packages/common-utils/src/` + diff --git a/agent_docs/tech_stack.md b/agent_docs/tech_stack.md new file mode 100644 index 00000000..9540a44c --- /dev/null +++ b/agent_docs/tech_stack.md @@ -0,0 +1,28 @@ +# HyperDX Technology Stack + +## Frontend (`packages/app`) + +- **Framework**: Next.js 14 with TypeScript +- **UI Components**: Mantine UI library (`@mantine/core`, `@mantine/dates`, `@mantine/hooks`) +- **State Management**: Jotai (global client state), TanStack Query (server state), URL params (filters) +- **Charts/Visualization**: Recharts, uPlot +- **Code Editor**: CodeMirror (for SQL/JSON editing) +- **Styling**: Mantine's built-in system, SCSS modules when needed + +**UI Component Priority**: Mantine components first โ†’ Custom components on Mantine primitives โ†’ Custom SCSS modules as last resort + +## Backend (`packages/api`) + +- **Runtime**: Node.js 22+ with TypeScript +- **Framework**: Express.js +- **Database**: ClickHouse (telemetry data), MongoDB via Mongoose (metadata) +- **Authentication**: Passport.js with local strategy +- **Validation**: Zod schemas +- **Telemetry**: Self-instrumented with `@hyperdx/node-opentelemetry` + +## Common Utilities (`packages/common-utils`) + +- Shared TypeScript utilities for query parsing and ClickHouse operations +- Zod schemas for data validation +- SQL formatting and query building helpers +