fleet/cmd/osquery-perf/software-library
2025-12-09 10:18:47 -06:00
..
source-data Add more software to loadtest (#35756) 2025-11-21 10:42:19 -06:00
tools Add more software to loadtest (#35756) 2025-11-21 10:42:19 -06:00
.gitignore Add more software to loadtest (#35756) 2025-11-21 10:42:19 -06:00
README.md Add more software to loadtest (#35756) 2025-11-21 10:42:19 -06:00
software.sql Remove email from software.sql (#36963) 2025-12-09 10:18:47 -06:00

Software library for osquery-perf

This directory contains the software database and tools used by osquery-perf for load testing.

Quick start

Initial setup

  1. Create the database:
sqlite3 software.db < software.sql
  1. Verify the database (optional):
sqlite3 software.db "SELECT COUNT(*) FROM software;"

sqlite3 software.db "SELECT source, COUNT(*) FROM software GROUP BY source;"
# Shows distribution across sources

Running osquery-perf

Once the database exists, osquery-perf will automatically use it:

cd ../..
./osquery-perf --host-count 1000

Each simulated host will get random platform-specific software from the database.

Directory structure

software-library/
├── README.md              # This file
├── software.db            # SQLite database (created from software.sql)
├── software.sql           # SQL dump with schema + data (source of truth)
├── tools/                 # Import and maintenance tools
│   ├── import-data/       # Import server data from CSV
│   └── generate-sql/      # Generate software.sql from database
└── source-data/           # Source CSV files (all gitignored)
    └── .gitignore

Tools

import-data

Imports software data from CSV files, validates entries, and optionally filters out internal/proprietary software.

Usage:

cd tools/import-data

# Import CSV file (no filtering)
go run . --input ../../source-data/server_export.csv

# Import with pattern filtering
go run . --input ../../source-data/server_export.csv --filter "numa-internal,numa-,corp-"

# Import with vendor filtering
go run . --input ../../source-data/server_export.csv --filter-vendor "numa"

# Dry run (validate without importing)
go run . --input ../../source-data/server_export.csv --dry-run

# Verbose output
go run . --input ../../source-data/server_export.csv --verbose

What it does:

  • Reads software entries from CSV files
  • Optional filtering (disabled by default):
    • --filter: Filter names containing specified patterns (comma-separated)
    • --filter-vendor: Filter software from specified vendor (except well-known public software)

generate-sql

Generates software.sql file from the populated database.

Usage:

cd tools/generate-sql

# Generate software.sql
go run .

# Specify custom paths
go run . --db ../../software.db --output ../../software.sql

# Verbose output (shows progress)
go run . --verbose

What it does:

  • Reads all data from software.db
  • Generates SQL INSERT statements
  • Includes schema definition
  • Creates reproducible SQL dump

Database setup workflow

Here's the typical workflow:

Step 1: Initialize database from software.sql

sqlite3 software.db < software.sql

This creates the database with schema and initial data.

Step 2: Export server data

Export software from Fleet's MySQL database to CSV:

mysql -h <host> -u <user> -p <database> --batch --raw -e "
SELECT
    'name', 'version', 'source', 'bundle_identifier', 'vendor', 'arch', 'release', 'extension_id', 'extension_for', 'application_id', 'upgrade_code'
UNION ALL
SELECT
    IFNULL(name, ''),
    IFNULL(version, ''),
    IFNULL(source, ''),
    IFNULL(bundle_identifier, ''),
    IFNULL(vendor, ''),
    IFNULL(arch, ''),
    IFNULL(\`release\`, ''),
    IFNULL(extension_id, ''),
    IFNULL(extension_for, ''),
    IFNULL(application_id, ''),
    IFNULL(upgrade_code, '')
FROM software
" 2>&1 | sed 's/\t/","/g' | sed 's/^/"/' | sed 's/$/"/' | tail -n +3 > source-data/server_export.csv

Note: This command properly quotes CSV fields to handle commas in values (e.g., "Red Hat, Inc."). The tail -n +3 removes the MySQL password warning message from the output.

This creates a CSV with the following columns:

  • name, version, source - Required fields
  • bundle_identifier - macOS bundle ID
  • vendor - Software vendor
  • arch - Architecture (x86_64, arm64, etc.)
  • release - Release info
  • extension_id - Browser/IDE extension ID
  • extension_for - Host software for extensions (Chrome, Firefox, VS Code, etc.)
  • application_id - Android application ID
  • upgrade_code - Windows upgrade GUID

Optional filtering:

  • Add WHERE clause to filter by date, team, or other criteria
  • Example: WHERE created_at >= DATE_SUB(NOW(), INTERVAL 30 DAY)

Step 3: Import server data

cd tools/import-data

# Import with filtering for internal software
go run . --input ../../source-data/server_export.csv \
  --filter "numa-internal,numa-,corp-,internal-" \
  --filter-vendor "numa" \
  --verbose

This imports and validates server data, optionally filtering out internal software.

Step 4: Generate software.sql

cd ../generate-sql

# Generate SQL dump
go run . --verbose

This creates software.sql that can recreate the entire database.

Step 5: Verify

# Check counts by source
sqlite3 software.db "
  SELECT
    source,
    COUNT(*) as count
  FROM software
  GROUP BY source
  ORDER BY count DESC
"