mirror of
https://github.com/Rohithgilla12/data-peek
synced 2026-04-21 12:57:16 +00:00
Add technical blog posts documenting data-peek features (#153)
Covers previously unblogged features: the Connection Health Monitor (pg_stat_activity dashboard + kill queries), the data masking toolbar, the Postgres LISTEN/NOTIFY panel, benchmark mode with p90/p95/p99, and the FK-aware data generator. Each post follows the existing notes/ voice, references real code paths, and is ready for cross-posting to dev.to with a canonical URL back to data-peek.app/blog. https://claude.ai/code/session_018GVvk8S82Qy9VVK4eibzaW Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
parent
f880b13cee
commit
f388e75834
6 changed files with 1146 additions and 0 deletions
|
|
@ -9,6 +9,11 @@ This folder is the single source of truth for technical notes and blog posts. Fi
|
|||
| [building-ai-sql-assistant.mdx](./building-ai-sql-assistant.mdx) | Building the AI SQL Assistant |
|
||||
| [ai-assistant-deep-dive.mdx](./ai-assistant-deep-dive.mdx) | Technical deep dive into AI components |
|
||||
| [query-performance-analyzer.mdx](./query-performance-analyzer.mdx) | Query Performance Analyzer with EXPLAIN |
|
||||
| [connection-health-monitor-in-a-sql-client.mdx](./connection-health-monitor-in-a-sql-client.mdx) | pg_stat_activity dashboard with one-click kill |
|
||||
| [blurring-pii-in-your-sql-client.mdx](./blurring-pii-in-your-sql-client.mdx) | Data masking toolbar for screen-shares and demos |
|
||||
| [listen-notify-without-tears.mdx](./listen-notify-without-tears.mdx) | Postgres LISTEN/NOTIFY debugger with SQLite history |
|
||||
| [benchmark-mode-p50-p90-p99.mdx](./benchmark-mode-p50-p90-p99.mdx) | Benchmark mode with p90/p95/p99 percentiles |
|
||||
| [fk-aware-fake-data-generator.mdx](./fk-aware-fake-data-generator.mdx) | FK-aware fake data generator with Faker.js |
|
||||
|
||||
## Creating a New Post
|
||||
|
||||
|
|
|
|||
240
notes/benchmark-mode-p50-p90-p99.mdx
Normal file
240
notes/benchmark-mode-p50-p90-p99.mdx
Normal file
|
|
@ -0,0 +1,240 @@
|
|||
---
|
||||
title: "Stop Pasting \\timing — Run Your SQL 100 Times and Get p99"
|
||||
description: "A single-run query timing is a lie. How data-peek's Benchmark Mode runs your query up to 500 times, computes p90/p95/p99 latencies, breaks down per-phase timing, and stops you from shipping a query based on a lucky first execution."
|
||||
date: "2026-04-11"
|
||||
author: "Rohith Gilla"
|
||||
tags: ["postgres", "performance", "sql", "database"]
|
||||
published: true
|
||||
---
|
||||
|
||||
Here is a trap I have fallen into more times than I can count.
|
||||
|
||||
I write a query. I run it. It takes 48ms. I nod, satisfied, and deploy it.
|
||||
In production it p99s at 1.8 seconds during peak traffic, the on-call
|
||||
engineer pages me, and I spend the next hour explaining how "on my machine
|
||||
it was fine."
|
||||
|
||||
A single `EXPLAIN ANALYZE` run is not a benchmark. It is an anecdote. The
|
||||
first run pays the cost of cold caches, parse, plan, and whatever else the
|
||||
database had queued. The second run is suspiciously fast because everything
|
||||
is now in the buffer cache. Somewhere between "ran it once in psql" and
|
||||
"ran it under real load" lives the actual distribution of latencies — and
|
||||
that distribution is what matters.
|
||||
|
||||
The traditional fix is an ad-hoc bash loop:
|
||||
|
||||
```bash
|
||||
for i in {1..100}; do
|
||||
psql -c "\timing" -c "SELECT ... your query ..." \
|
||||
| grep 'Time:'
|
||||
done | awk '{print $2}' | sort -n | ...
|
||||
```
|
||||
|
||||
And then you remember you do not know the awk incantation for p99 off the
|
||||
top of your head, you open Stack Overflow, you copy something, it gives you
|
||||
the wrong percentile, you mutter, and you give up.
|
||||
|
||||
data-peek has a Benchmark button. Click it, pick how many runs, wait, read
|
||||
the percentiles. That is the whole interaction.
|
||||
|
||||
## What you get
|
||||
|
||||
The button (`src/renderer/src/components/benchmark-button.tsx`) is a dropdown
|
||||
with four presets:
|
||||
|
||||
```ts
|
||||
const RUN_OPTIONS = [
|
||||
{ count: 10, label: '10 runs', description: 'Quick test' },
|
||||
{ count: 50, label: '50 runs', description: 'Standard benchmark' },
|
||||
{ count: 100, label: '100 runs', description: 'Detailed analysis' },
|
||||
{ count: 500, label: '500 runs', description: 'Statistical precision' }
|
||||
]
|
||||
```
|
||||
|
||||
Pick one, the currently open query runs that many times, and the results
|
||||
panel updates with:
|
||||
|
||||
- **Average, min, max** latency
|
||||
- **p90, p95, p99** latency
|
||||
- **Standard deviation** (the shape of the distribution matters; a query
|
||||
with 40ms average and 200ms std dev is a different beast than one with
|
||||
40ms average and 2ms std dev)
|
||||
- **Per-phase breakdown** — connect, plan, execute, fetch — each with its
|
||||
own percentiles
|
||||
|
||||
That last one is the difference between "the query is slow" and "the query
|
||||
is fast but the planner is slow." I would not have guessed how often the
|
||||
answer is the second one until I could actually see it.
|
||||
|
||||
## How the runs happen
|
||||
|
||||
The renderer calls `db:benchmark` over IPC with a run count, and the main
|
||||
process loops through the query that many times, collecting telemetry on
|
||||
each pass. The important bits from `src/main/ipc/query-handlers.ts`:
|
||||
|
||||
```ts
|
||||
ipcMain.handle(
|
||||
'db:benchmark',
|
||||
async (_, { config, query, runCount }) => {
|
||||
// Validate run count
|
||||
if (runCount < 1 || runCount > 1000) {
|
||||
return { success: false, error: 'Run count must be between 1 and 1000' }
|
||||
}
|
||||
|
||||
const adapter = getAdapter(config)
|
||||
const telemetryRuns: QueryTelemetry[] = []
|
||||
|
||||
for (let i = 0; i < runCount; i++) {
|
||||
const executionId = `benchmark-${Date.now()}-${i}`
|
||||
|
||||
try {
|
||||
const result = await adapter.queryMultiple(config, query, {
|
||||
executionId,
|
||||
collectTelemetry: true
|
||||
})
|
||||
|
||||
if (result.telemetry) {
|
||||
telemetryRuns.push(result.telemetry)
|
||||
}
|
||||
|
||||
// Small delay between runs to avoid overwhelming the database
|
||||
if (i < runCount - 1) {
|
||||
await new Promise((resolve) => setTimeout(resolve, 10))
|
||||
}
|
||||
} catch (runError) {
|
||||
// If a run fails, log it but continue
|
||||
log.warn(`Benchmark run ${i + 1} failed:`, runError)
|
||||
}
|
||||
}
|
||||
|
||||
if (telemetryRuns.length === 0) {
|
||||
return { success: false, error: 'All benchmark runs failed' }
|
||||
}
|
||||
|
||||
const benchmarkResult = telemetryCollector.aggregateBenchmark(telemetryRuns)
|
||||
return { success: true, data: benchmarkResult }
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
Three things I want to point out.
|
||||
|
||||
**Failed runs do not abort the benchmark.** If run 37 out of 100 hits a
|
||||
deadlock and errors, we log it and keep going. At the end, as long as at
|
||||
least one run succeeded, we aggregate. This matters because in a real
|
||||
database, transient failures happen, and throwing away 99 successful runs
|
||||
because one hit a lock is worse than reporting on 99 successes.
|
||||
|
||||
**There is a 10ms delay between runs.** I went back and forth on this.
|
||||
Without it, 500 runs slam the server and you get a warped picture — the
|
||||
later runs start queuing because the server is busy serving the earlier
|
||||
runs. With it, you give the database a breath between iterations and the
|
||||
distribution starts to look like what you would see under spaced-out
|
||||
production traffic. 10ms is a compromise; the correct answer depends on
|
||||
your workload.
|
||||
|
||||
**`collectTelemetry: true` is the hook into the per-phase breakdown.** The
|
||||
adapter instruments connect/plan/execute/fetch and returns a
|
||||
`QueryTelemetry` per run, which is how we end up with phase-level
|
||||
percentiles instead of just end-to-end numbers.
|
||||
|
||||
## The percentile math
|
||||
|
||||
The percentile function is deliberately unfancy. It lives in
|
||||
`packages/shared/src/index.ts` so both main and renderer can use the same
|
||||
implementation:
|
||||
|
||||
```ts
|
||||
export function calcPercentile(sorted: number[], p: number): number {
|
||||
if (sorted.length === 0) return 0
|
||||
if (sorted.length === 1) return sorted[0]
|
||||
const idx = Math.ceil((p / 100) * sorted.length) - 1
|
||||
return sorted[Math.max(0, Math.min(idx, sorted.length - 1))]
|
||||
}
|
||||
```
|
||||
|
||||
This is the "nearest-rank" percentile, the definition you learn in stats 101.
|
||||
It is not the interpolated version you get from numpy by default, and it is
|
||||
not the version the Postgres docs use internally. It is the version that
|
||||
matches how most developers intuit percentiles: "p99 is a real value from
|
||||
the dataset, not an interpolated one between two samples." With 500 runs,
|
||||
the difference between nearest-rank and interpolated is in the noise.
|
||||
|
||||
The aggregation:
|
||||
|
||||
```ts
|
||||
aggregateBenchmark(runs: QueryTelemetry[]): BenchmarkResult {
|
||||
const durations = runs
|
||||
.map((r) => r.totalDurationMs)
|
||||
.sort((a, b) => a - b)
|
||||
const sum = durations.reduce((a, b) => a + b, 0)
|
||||
const avg = sum / durations.length
|
||||
|
||||
const stats = {
|
||||
avg,
|
||||
min: durations[0],
|
||||
max: durations[durations.length - 1],
|
||||
p90: calcPercentile(durations, 90),
|
||||
p95: calcPercentile(durations, 95),
|
||||
p99: calcPercentile(durations, 99),
|
||||
stdDev: calcStdDev(durations, avg)
|
||||
}
|
||||
// ... then the same treatment for each phase
|
||||
}
|
||||
```
|
||||
|
||||
Sort once, pull percentiles by index, done. The stddev lives in `shared`
|
||||
too and is the textbook `sqrt(mean((x - mean)^2))`. No external stats
|
||||
library; I refuse to pull in `simple-statistics` or `d3-array` for 30 lines
|
||||
of arithmetic.
|
||||
|
||||
## What the per-phase breakdown taught me
|
||||
|
||||
Running benchmarks on my own queries, I learned three things that surprised
|
||||
me.
|
||||
|
||||
**Connect time is not zero.** Even with pooling in a desktop client, the
|
||||
first few runs of a benchmark pay a noticeable reconnect cost. By run 10 it
|
||||
has settled. If your app creates a fresh connection on every query (hi,
|
||||
serverless), your production p99 is going to be dominated by that
|
||||
connection step and no amount of index-tuning will save you.
|
||||
|
||||
**Plan time varies more than execute time for small, simple queries.** If
|
||||
your query executes in 2ms but the planner takes 1–4ms of variable time,
|
||||
your query is effectively plan-bound. Prepared statements stop being
|
||||
"nice to have" and become the actual fix.
|
||||
|
||||
**Fetch is where network latency hides.** If you are running data-peek on
|
||||
your laptop against a remote Postgres, the fetch phase is where the
|
||||
round-trip tax shows up. Running the same benchmark on the same server
|
||||
against the same database will show you an entirely different fetch
|
||||
percentile. That is the real cost of remote development.
|
||||
|
||||
## What I'd do differently
|
||||
|
||||
**I'd add a "warmup" option.** Right now the first few runs are always
|
||||
slower than the rest because caches are cold. I report them as part of the
|
||||
distribution, which is honest but not always useful. A "discard first N"
|
||||
flag would make it easy to ask "what does the steady-state p99 look like?"
|
||||
without manually trimming the data.
|
||||
|
||||
**I'd add a run histogram.** p50/p90/p99 are summaries. A 20-bucket
|
||||
histogram would show you the shape directly — bimodal distributions
|
||||
(which almost always mean "cache hit vs cache miss") become obvious
|
||||
instantly. The BenchmarkResult has the raw durations, so this is a
|
||||
render-side change, not a data change.
|
||||
|
||||
**I'd let you compare two benchmarks.** "Did my new index help?" is the
|
||||
question. Right now you screenshot the before, run the benchmark again
|
||||
after, and eyeball the difference. A stored-comparison view would be the
|
||||
feature that actually ships index changes with confidence.
|
||||
|
||||
## Try it
|
||||
|
||||
Write a query. Click the Benchmark button. Pick 100 runs. Look at the
|
||||
distribution, not the average. That is the one-sentence pitch.
|
||||
|
||||
data-peek lives at [data-peek.app](https://data-peek.app). The benchmark
|
||||
path is `src/main/ipc/query-handlers.ts` (`db:benchmark`) and
|
||||
`src/main/telemetry-collector.ts` (`aggregateBenchmark`) if you want to
|
||||
read how it is wired. MIT source, free for personal use.
|
||||
186
notes/blurring-pii-in-your-sql-client.mdx
Normal file
186
notes/blurring-pii-in-your-sql-client.mdx
Normal file
|
|
@ -0,0 +1,186 @@
|
|||
---
|
||||
title: "I Can Finally Screen-Share My SQL Client Without Leaking Prod Data"
|
||||
description: "How data-peek auto-masks PII columns with regex rules, a CSS blur, and an Alt-to-peek escape hatch — so you can demo, record, or pair on production data without the pre-flight panic."
|
||||
date: "2026-04-11"
|
||||
author: "Rohith Gilla"
|
||||
tags: ["privacy", "security", "database", "webdev"]
|
||||
published: true
|
||||
---
|
||||
|
||||
We were halfway through a customer demo when I remembered I was connected to
|
||||
staging, not to the demo seed database. I had just typed
|
||||
`SELECT * FROM users LIMIT 20` and hit Cmd+Enter. Twenty real email addresses
|
||||
appeared on my screen, which was mirrored to a conference room of people who
|
||||
were not supposed to see them.
|
||||
|
||||
I alt-tabbed to my Zoom window so fast I think I pulled a tendon.
|
||||
|
||||
There was no harm done — staging data is obfuscated, the emails were fake,
|
||||
the customer is still a customer. But the adrenaline was real, and the sheer
|
||||
avoidable stupidity of the situation stuck with me. Every SQL client I have
|
||||
ever used will happily render `hunter2` in plaintext to whoever is pointing a
|
||||
camera at your laptop. That is not a sensible default in 2026.
|
||||
|
||||
So I added a data masking layer to data-peek.
|
||||
|
||||
## What it does
|
||||
|
||||
Two things, really.
|
||||
|
||||
**It auto-masks columns by name.** Out of the box, every column whose name
|
||||
matches `email`, `password`, `passwd`, `pwd`, `ssn`, `social_security`,
|
||||
`token`, `secret`, `api_key`, or `apikey` is blurred. Phone and address
|
||||
patterns are in the rule list but disabled by default because they create too
|
||||
many false positives on arbitrary schemas. The rules are just regexes, so
|
||||
you can add your own (for my team: a `stripe_` rule that catches
|
||||
`stripe_customer_id` and friends).
|
||||
|
||||
**You can manually mask any column, any time.** Click a column header, hit
|
||||
"Mask Column," done. The masked state is scoped per tab, so masking
|
||||
`users.full_name` in one query does not affect a different query.
|
||||
|
||||
Masked cells render with `filter: blur(5px)` and `user-select: none`. You can
|
||||
see the *shape* of the data — same row height, same column width, no layout
|
||||
shift — but not the contents. When you actually need to see a value, hold
|
||||
`Alt` and hover. The cell reveals for as long as you hold it and re-blurs
|
||||
when you let go.
|
||||
|
||||
That hover-to-peek mode is the best part. It keeps you *in* the flow:
|
||||
"checking a single email address to verify an account" no longer means
|
||||
revealing twenty of them.
|
||||
|
||||
## The rules
|
||||
|
||||
These are the defaults, straight from `src/renderer/src/stores/masking-store.ts`:
|
||||
|
||||
```ts
|
||||
const DEFAULT_RULES: AutoMaskRule[] = [
|
||||
{ id: 'email', pattern: 'email', enabled: true },
|
||||
{ id: 'password', pattern: 'password|passwd|pwd', enabled: true },
|
||||
{ id: 'ssn', pattern: 'ssn|social_security', enabled: true },
|
||||
{ id: 'token', pattern: 'token|secret|api_key|apikey', enabled: true },
|
||||
{ id: 'phone', pattern: 'phone|mobile|cell', enabled: false },
|
||||
{ id: 'address', pattern: 'address|street', enabled: false }
|
||||
]
|
||||
```
|
||||
|
||||
Two things I learned writing the matcher.
|
||||
|
||||
First, **case insensitivity is mandatory, not optional.** Different ORMs and
|
||||
naming conventions will give you `email`, `Email`, `EMAIL`, `emailAddress`,
|
||||
`email_addr`. The matcher compiles each pattern as `new RegExp(rule.pattern, 'i')`
|
||||
so one rule catches them all. Without that flag, `Email` slips through
|
||||
every time and you have a false sense of security.
|
||||
|
||||
Second, **the effective mask is the union of manual and auto.** If you
|
||||
manually unmask a column but the auto-rule still matches, the auto-rule wins
|
||||
on the next render. This was a deliberate choice: the whole point is to fail
|
||||
*closed*. If you want to permanently exclude a column, you edit the rule,
|
||||
not the cell.
|
||||
|
||||
Here is the resolver that combines both sources:
|
||||
|
||||
```ts
|
||||
getEffectiveMaskedColumns: (tabId, allColumns) => {
|
||||
const { maskedColumns, autoMaskRules, autoMaskEnabled } = get()
|
||||
const manualMasked = new Set(maskedColumns[tabId] ?? [])
|
||||
|
||||
if (!autoMaskEnabled) return manualMasked
|
||||
|
||||
const effective = new Set(manualMasked)
|
||||
for (const col of allColumns) {
|
||||
for (const rule of autoMaskRules) {
|
||||
if (!rule.enabled) continue
|
||||
try {
|
||||
const regex = new RegExp(rule.pattern, 'i')
|
||||
if (regex.test(col)) {
|
||||
effective.add(col)
|
||||
break
|
||||
}
|
||||
} catch {
|
||||
// Invalid regex — skip
|
||||
}
|
||||
}
|
||||
}
|
||||
return effective
|
||||
}
|
||||
```
|
||||
|
||||
The `try/catch` around the regex is there because the rule list is
|
||||
user-editable. If someone adds `user[` as a pattern, I do not want the entire
|
||||
results grid to crash. The invalid rule silently no-ops. A production-grade
|
||||
version would surface a red squiggle in the rule editor; I did not do that
|
||||
yet.
|
||||
|
||||
## The render path
|
||||
|
||||
The masking logic lives in Zustand; the render logic lives in one small
|
||||
cell component. The meaningful lines from `data-table.tsx`:
|
||||
|
||||
```tsx
|
||||
function MaskedCell({ isMasked, hoverToPeek, children }: MaskedCellProps) {
|
||||
const [peeking, setPeeking] = useState(false)
|
||||
|
||||
const onMouseEnter = (e: React.MouseEvent) => {
|
||||
if (hoverToPeek && e.altKey) setPeeking(true)
|
||||
}
|
||||
|
||||
return (
|
||||
<span
|
||||
onMouseEnter={onMouseEnter}
|
||||
onMouseLeave={() => setPeeking(false)}
|
||||
style={peeking ? undefined : { filter: 'blur(5px)', userSelect: 'none' }}
|
||||
>
|
||||
{children}
|
||||
</span>
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
That is the whole thing. No canvas trickery, no data-URL sleight of hand,
|
||||
no modified result set — the *raw value* is still in the DOM. If someone is
|
||||
smart enough to open devtools on your SQL client during a demo, they can dig
|
||||
it out. The threat model here is "accidentally revealing data to a camera or
|
||||
screen-share," not "malicious insider with inspector access." I think that is
|
||||
the right level to aim at. Perfect is the enemy of *I will actually use it
|
||||
every day*.
|
||||
|
||||
The `userSelect: 'none'` matters more than the blur: it means you cannot
|
||||
double-click and copy a masked value into the clipboard by reflex. One of
|
||||
the quiet ways PII leaks is not from someone reading your screen, it is from
|
||||
you pasting a blurred value into Slack thinking "surely the blur meant that
|
||||
wasn't the real thing."
|
||||
|
||||
## What I'd do differently
|
||||
|
||||
**I wish I had done it the other way around.** The blur is a presentation
|
||||
trick. A truly paranoid version would mask at the IPC boundary — have the
|
||||
main process redact values before they ever hit the renderer, based on the
|
||||
same rules. That way a devtools inspector genuinely cannot see the original.
|
||||
The tradeoff is that hover-to-peek becomes a round-trip through IPC, which
|
||||
adds latency to the interaction. I chose the fast UX and a weaker threat
|
||||
model. I still think it is the right call, but I want the IPC-redaction
|
||||
option as a toggle for security-conscious users.
|
||||
|
||||
**The rules should be repo-shareable.** Right now each user's rules live in
|
||||
their own Zustand-persisted local storage. But a team would reasonably want
|
||||
a shared rule set — "our customers table has a `pci_encrypted_pan` column,
|
||||
mask that everywhere" — and right now there is no way to distribute that
|
||||
short of everyone copy-pasting it manually. A `.data-peek-masks.json` at the
|
||||
repo root would solve it. Queued up.
|
||||
|
||||
**The phone pattern should probably be on by default.** It is off because
|
||||
`mobile_application_id` and friends match the pattern and create noise. But
|
||||
"noise from too much masking" is a strictly better failure mode than
|
||||
"leaking a phone number in a Loom." I will flip the default.
|
||||
|
||||
## The honest pitch
|
||||
|
||||
If you have ever had a "I was sharing my screen" moment, this feature is for
|
||||
you. If you record Loom demos, pair on production bugs, or stream your
|
||||
coding, this feature is *definitely* for you.
|
||||
|
||||
data-peek is at [data-peek.app](https://data-peek.app). The masking code is
|
||||
open source — `src/renderer/src/stores/masking-store.ts` if you want to
|
||||
read it or port the idea to your own tool. Copy it, improve it, tell me what
|
||||
you did differently. The goal is fewer adrenaline spikes in conference rooms.
|
||||
229
notes/connection-health-monitor-in-a-sql-client.mdx
Normal file
229
notes/connection-health-monitor-in-a-sql-client.mdx
Normal file
|
|
@ -0,0 +1,229 @@
|
|||
---
|
||||
title: "I Put pg_stat_activity in My SQL Client — And Added a Kill Button"
|
||||
description: "How data-peek's Connection Health Monitor turns Postgres system views into a two-second refresh dashboard with one-click query cancellation, live lock detection, and cache-hit ratios."
|
||||
date: "2026-04-11"
|
||||
author: "Rohith Gilla"
|
||||
tags: ["postgres", "database", "devops", "opensource"]
|
||||
published: true
|
||||
---
|
||||
|
||||
It was 11:47 PM when the Datadog alert came in. API latency on `/checkout` had
|
||||
tripled. I SSH'd into the bastion, opened psql, and started typing the query
|
||||
I have typed a thousand times:
|
||||
|
||||
```sql
|
||||
SELECT pid, state, EXTRACT(EPOCH FROM (now() - query_start)) AS seconds, query
|
||||
FROM pg_stat_activity
|
||||
WHERE state != 'idle'
|
||||
ORDER BY query_start;
|
||||
```
|
||||
|
||||
Then I ran it. Then I ran it again six seconds later. Then again. Then I
|
||||
opened a second tab and started typing `SELECT * FROM pg_locks` because I was
|
||||
pretty sure one of the long-running queries was blocking something. Three
|
||||
terminals, two SSH sessions, one increasingly tired human.
|
||||
|
||||
Every Postgres operator has been here. The data you need lives in system
|
||||
views — `pg_stat_activity`, `pg_locks`, `pg_statio_user_tables`,
|
||||
`pg_stat_user_tables`. Reaching for them through a shell at midnight is a
|
||||
productivity tax I got tired of paying.
|
||||
|
||||
So I built the Connection Health Monitor into data-peek.
|
||||
|
||||
## What it does
|
||||
|
||||
The Health Monitor is a dedicated tab — not a modal, not a sidebar — that
|
||||
refreshes every 2, 5, 10, or 30 seconds (configurable), and shows four panels:
|
||||
|
||||
1. **Active Queries** — everything in `pg_stat_activity` that isn't idle, with
|
||||
duration, wait events, and a kill button next to each row.
|
||||
2. **Table Sizes** — the top 50 tables by total size, including heap and index
|
||||
bytes, with a row-count estimate from `pg_stat_user_tables.n_live_tup`.
|
||||
3. **Cache Hit Ratios** — buffer and index cache hit percentages, plus a
|
||||
per-table breakdown of seq-scan vs index-scan counts.
|
||||
4. **Locks & Blocking** — the classic blocked/blocking join from `pg_locks`,
|
||||
with both the blocked query and the query holding the lock visible side by
|
||||
side.
|
||||
|
||||
The kill button calls `pg_cancel_backend(pid)`. No confirmation dialog. If
|
||||
you hit it by accident, the worst that happens is a query fails and you run
|
||||
it again. That is the right tradeoff at midnight.
|
||||
|
||||
Every panel has a "Share" button that generates a clean screenshot suitable
|
||||
for pasting into an incident Slack channel. That last bit came from a real
|
||||
incident where I wanted to show the on-call DBA what I was seeing and ended
|
||||
up cropping and masking a terminal screenshot for ten minutes. Now it is a
|
||||
single click.
|
||||
|
||||
## How it's wired
|
||||
|
||||
The Health Monitor is an honest, boring pipeline, and I think that is the
|
||||
point.
|
||||
|
||||
```
|
||||
┌────────────────────────────┐
|
||||
│ HealthMonitor.tsx (React) │ ← refresh interval, kill buttons, share UI
|
||||
└──────────────┬─────────────┘
|
||||
│ window.api.db.*
|
||||
┌──────────────┴─────────────┐
|
||||
│ health-handlers.ts (IPC) │ ← ipcMain.handle('db:active-queries', …)
|
||||
└──────────────┬─────────────┘
|
||||
│ getAdapter(config)
|
||||
┌──────────────┴─────────────┐
|
||||
│ postgres-adapter.ts │ ← the actual SQL against pg_stat_*
|
||||
└────────────────────────────┘
|
||||
```
|
||||
|
||||
The IPC handlers (`src/main/ipc/health-handlers.ts`) are wafer-thin — each
|
||||
one is about ten lines, dispatches to the adapter, and wraps the result in
|
||||
the project's standard `IpcResponse<T>` shape:
|
||||
|
||||
```ts
|
||||
ipcMain.handle('db:active-queries', async (_, config: ConnectionConfig) => {
|
||||
try {
|
||||
const adapter = getAdapter(config)
|
||||
const queries = await adapter.getActiveQueries(config)
|
||||
return { success: true, data: queries } as IpcResponse<typeof queries>
|
||||
} catch (error) {
|
||||
log.error('Failed to get active queries:', error)
|
||||
return { success: false, error: String(error) } as IpcResponse<never>
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
That pattern ensures every health panel can fail independently. If `pg_locks`
|
||||
is slow because something dramatic is happening, the Active Queries panel
|
||||
still refreshes on its own schedule.
|
||||
|
||||
### The SQL, unvarnished
|
||||
|
||||
The thing dev.to tutorials usually hide — the actual SQL — is the part I find
|
||||
most useful. Here is the query powering the Active Queries panel, straight
|
||||
from `src/main/adapters/postgres-adapter.ts`:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
pid,
|
||||
usename AS user,
|
||||
datname AS database,
|
||||
state,
|
||||
COALESCE(
|
||||
EXTRACT(EPOCH FROM (now() - query_start))::text || 's',
|
||||
'0s'
|
||||
) AS duration,
|
||||
COALESCE(EXTRACT(EPOCH FROM (now() - query_start)) * 1000, 0)::bigint AS duration_ms,
|
||||
query,
|
||||
wait_event_type || ':' || wait_event AS wait_event,
|
||||
application_name
|
||||
FROM pg_stat_activity
|
||||
WHERE state != 'idle'
|
||||
AND pid != pg_backend_pid()
|
||||
AND query NOT LIKE '%pg_stat_activity%'
|
||||
ORDER BY query_start ASC NULLS LAST
|
||||
```
|
||||
|
||||
Two details worth calling out:
|
||||
|
||||
- `pid != pg_backend_pid()` filters out the monitoring query itself. Without
|
||||
this, you spend ten minutes wondering why there is always a query running.
|
||||
- `query NOT LIKE '%pg_stat_activity%'` is a belt-and-braces filter for the
|
||||
case where another client is ALSO polling this view. When I skipped it, the
|
||||
dashboard kept showing "the monitoring client is monitoring itself."
|
||||
|
||||
The cache-hit ratio query is where people usually copy-paste wrong:
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
CASE WHEN SUM(heap_blks_hit) + SUM(heap_blks_read) = 0 THEN 0
|
||||
ELSE ROUND(SUM(heap_blks_hit)::numeric / (SUM(heap_blks_hit) + SUM(heap_blks_read)) * 100, 2)
|
||||
END AS buffer_cache_hit_ratio,
|
||||
CASE WHEN SUM(idx_blks_hit) + SUM(idx_blks_read) = 0 THEN 0
|
||||
ELSE ROUND(SUM(idx_blks_hit)::numeric / (SUM(idx_blks_hit) + SUM(idx_blks_read)) * 100, 2)
|
||||
END AS index_hit_ratio
|
||||
FROM pg_statio_user_tables
|
||||
```
|
||||
|
||||
The `CASE` guards against division by zero on freshly restarted databases.
|
||||
Without them, the panel shows `NaN%` for the first few minutes after a
|
||||
restart, which is exactly when you are most likely to be looking at it.
|
||||
|
||||
### The lock detection join
|
||||
|
||||
The Locks panel is the only query I am genuinely proud of. The canonical
|
||||
"blocked by whom" join in Postgres is notoriously ugly because `pg_locks`
|
||||
does not give you a single "blocker pid" column — you have to self-join on
|
||||
every distinguishing column:
|
||||
|
||||
```sql
|
||||
FROM pg_locks blocked
|
||||
JOIN pg_stat_activity blocked_activity ON blocked.pid = blocked_activity.pid
|
||||
JOIN pg_locks blocking ON (
|
||||
blocked.locktype = blocking.locktype
|
||||
AND blocked.database IS NOT DISTINCT FROM blocking.database
|
||||
AND blocked.relation IS NOT DISTINCT FROM blocking.relation
|
||||
AND blocked.page IS NOT DISTINCT FROM blocking.page
|
||||
AND blocked.tuple IS NOT DISTINCT FROM blocking.tuple
|
||||
AND blocked.virtualxid IS NOT DISTINCT FROM blocking.virtualxid
|
||||
AND blocked.transactionid IS NOT DISTINCT FROM blocking.transactionid
|
||||
AND blocked.classid IS NOT DISTINCT FROM blocking.classid
|
||||
AND blocked.objid IS NOT DISTINCT FROM blocking.objid
|
||||
AND blocked.objsubid IS NOT DISTINCT FROM blocking.objsubid
|
||||
AND blocked.pid != blocking.pid
|
||||
)
|
||||
JOIN pg_stat_activity blocking_activity ON blocking.pid = blocking_activity.pid
|
||||
WHERE NOT blocked.granted AND blocking.granted
|
||||
```
|
||||
|
||||
`IS NOT DISTINCT FROM` is the operator that treats `NULL = NULL` as true,
|
||||
which matters because most of these columns are NULL for any given lock type.
|
||||
If you use plain `=` here, the join silently returns zero rows and you
|
||||
conclude that nothing is blocking anything. Ask me how I know.
|
||||
|
||||
## The kill button
|
||||
|
||||
```ts
|
||||
async killQuery(config, pid) {
|
||||
const client = new Client(buildClientConfig(config, tunnelOverrides))
|
||||
await client.connect()
|
||||
const result = await client.query('SELECT pg_cancel_backend($1) AS cancelled', [pid])
|
||||
const cancelled = result.rows[0]?.cancelled === true
|
||||
return cancelled
|
||||
? { success: true }
|
||||
: { success: false, error: 'Failed to cancel query - process may have already completed' }
|
||||
}
|
||||
```
|
||||
|
||||
`pg_cancel_backend` sends SIGINT to the backend process. It is the *polite*
|
||||
version — `pg_terminate_backend` is the hammer, and I deliberately do not
|
||||
expose it from the UI because killing a backend mid-transaction is a foot-gun
|
||||
that no amount of confirm dialogs can save you from. If `pg_cancel_backend`
|
||||
fails to stop the query, the assumption is that you want to keep looking
|
||||
before escalating.
|
||||
|
||||
## What I'd do differently
|
||||
|
||||
Two things I regret.
|
||||
|
||||
First, I hard-coded `LIMIT 50` in the Table Sizes query. It was fine for my
|
||||
own databases. Then someone with 12,000 tables opened the panel and their
|
||||
disk groaned for nine seconds before anything appeared. A parameterized limit
|
||||
with a default would have saved that. I will get to it.
|
||||
|
||||
Second, the refresh interval is per-panel but the polling is not staggered.
|
||||
When all four panels refresh at 2s, they all fire at the same tick, and a
|
||||
slow database sees four simultaneous connections every two seconds. Staggering
|
||||
them by 500ms each would be gentler on the server. Classic premature
|
||||
optimization trap — I built the simple version, shipped it, and only noticed
|
||||
when a coworker complained.
|
||||
|
||||
## Try it
|
||||
|
||||
If you want to see the Health Monitor without downloading anything, the
|
||||
screenshots are in the [data-peek README](https://github.com/Rohithgilla12/data-peek).
|
||||
The code for this feature lives in `src/renderer/src/components/health-monitor.tsx`
|
||||
and `src/main/adapters/postgres-adapter.ts`.
|
||||
|
||||
The app itself is at [data-peek.app](https://data-peek.app) — MIT source,
|
||||
free for personal use, one-time license for commercial. I would rather you
|
||||
read the SQL than click the download button, honestly. If it is the kind of
|
||||
thing you'd want in your toolbox, the download is one click from there.
|
||||
258
notes/fk-aware-fake-data-generator.mdx
Normal file
258
notes/fk-aware-fake-data-generator.mdx
Normal file
|
|
@ -0,0 +1,258 @@
|
|||
---
|
||||
title: "Generating Realistic Seed Data That Respects Foreign Keys, in 20 Seconds"
|
||||
description: "How data-peek's built-in data generator uses column-name heuristics, Faker.js, and a live FK sampler to produce 10,000 rows of believable test data across related tables — without writing a seed script."
|
||||
date: "2026-04-11"
|
||||
author: "Rohith Gilla"
|
||||
tags: ["database", "postgres", "testing", "webdev"]
|
||||
published: true
|
||||
---
|
||||
|
||||
Someone asks for a demo. You need 10,000 users, 30,000 orders, a handful of
|
||||
products, and enough variety that the UI does not look fake. You have
|
||||
twenty minutes.
|
||||
|
||||
If you have been here before, you know the options:
|
||||
|
||||
1. **Write a seed script.** Open your editor, import Faker, write the
|
||||
loops, get the foreign keys wrong twice, rerun, get them right, run
|
||||
into a `FOREIGN KEY constraint violation` on line 847, swear.
|
||||
2. **Use a CLI tool.** Install something, read its YAML schema format,
|
||||
configure it, discover that it does not handle your vendor-specific
|
||||
column type, give up.
|
||||
3. **Copy a SQL file from Stack Overflow.** Hope it does not have
|
||||
`DROP DATABASE` in it somewhere.
|
||||
|
||||
I went through option 1 enough times that I built option 4 into data-peek:
|
||||
a Data Generator tab that reads your table's schema, guesses how each
|
||||
column should be filled, samples existing foreign key values from the real
|
||||
database, and batch-inserts. No configuration required for the common case.
|
||||
|
||||
## What it does from the outside
|
||||
|
||||
Open a table. Click "Generate Data." A new tab opens with a row per
|
||||
column. Each column is pre-filled with a sensible generator based on its
|
||||
name and type:
|
||||
|
||||
- `email` → `faker.internet.email`
|
||||
- `first_name` → `faker.person.firstName`
|
||||
- `created_at` → `faker.date.recent`
|
||||
- `uuid`, `guid` → `faker.string.uuid`
|
||||
- `user_id` with a foreign key → `fk-reference` to `users.id`
|
||||
- A `status` enum column → `random-enum` with the discovered values
|
||||
- Anything unrecognized → `lorem.word` (clearly useless, easy to spot and
|
||||
replace)
|
||||
|
||||
You can override any of these, add a null percentage (for "15% of rows
|
||||
should have NULL in this column"), set a seed for reproducibility, and
|
||||
preview the first five rows before committing. Then you set the row count
|
||||
and hit Generate.
|
||||
|
||||
## The heuristic table
|
||||
|
||||
The whole "it just works" impression comes from one lookup table in
|
||||
`src/main/data-generator.ts`:
|
||||
|
||||
```ts
|
||||
const HEURISTICS: Heuristic[] = [
|
||||
{ pattern: /^email$/i, generator: { generatorType: 'faker', fakerMethod: 'internet.email' } },
|
||||
{ pattern: /^(first_?name|fname)$/i, generator: { generatorType: 'faker', fakerMethod: 'person.firstName' } },
|
||||
{ pattern: /^(last_?name|lname|surname)$/i, generator: { generatorType: 'faker', fakerMethod: 'person.lastName' } },
|
||||
{ pattern: /^(name|full_?name)$/i, generator: { generatorType: 'faker', fakerMethod: 'person.fullName' } },
|
||||
{ pattern: /^(phone|mobile|cell)$/i, generator: { generatorType: 'faker', fakerMethod: 'phone.number' } },
|
||||
{ pattern: /^(city)$/i, generator: { generatorType: 'faker', fakerMethod: 'location.city' } },
|
||||
{ pattern: /^(country)$/i, generator: { generatorType: 'faker', fakerMethod: 'location.country' } },
|
||||
{ pattern: /^(url|website)$/i, generator: { generatorType: 'faker', fakerMethod: 'internet.url' } },
|
||||
{ pattern: /^(bio|description|about)$/i, generator: { generatorType: 'faker', fakerMethod: 'lorem.paragraph' } },
|
||||
{ pattern: /^(title|subject)$/i, generator: { generatorType: 'faker', fakerMethod: 'lorem.sentence' } },
|
||||
{ pattern: /^(company|organization)$/i, generator: { generatorType: 'faker', fakerMethod: 'company.name' } },
|
||||
{ pattern: /^(created|updated|deleted)_?(at|on|date)?$/i,
|
||||
generator: { generatorType: 'faker', fakerMethod: 'date.recent' } },
|
||||
{ pattern: /^(uuid|guid)$/i, generator: { generatorType: 'uuid' } }
|
||||
]
|
||||
```
|
||||
|
||||
This is boring and I am proud of it. Every single entry was added the
|
||||
first time I opened a new table and saw a generator make a wrong guess.
|
||||
"Oh, it filled `bio` with lorem.word, that should be lorem.paragraph" —
|
||||
and then I added the rule. The heuristic is 40 lines and handles the
|
||||
column names I have seen on every CRUD schema I have built in the last
|
||||
decade.
|
||||
|
||||
Anything not in the table falls through a data-type-based fallback
|
||||
(integers get `random-int`, booleans get `random-boolean`, dates get
|
||||
`random-date`), and everything else defaults to `faker.lorem.word` — a
|
||||
deliberate "this is clearly wrong, go fix it" placeholder.
|
||||
|
||||
## The FK sampler
|
||||
|
||||
This is the part that turns it from a toy into something you would
|
||||
actually use.
|
||||
|
||||
When you mark a column as `fk-reference`, you point it at the parent table
|
||||
and column. Before any rows are generated, the main process samples up to
|
||||
1000 real values from that referenced column:
|
||||
|
||||
```ts
|
||||
export async function resolveFK(
|
||||
adapter, connectionConfig, schema, fkTable, fkColumn
|
||||
): Promise<unknown[]> {
|
||||
const dbType = connectionConfig.dbType
|
||||
const quotedTable = quoteId(fkTable, dbType)
|
||||
const tableRef =
|
||||
schema && schema !== 'public' && schema !== 'main' && schema !== 'dbo'
|
||||
? `${quoteId(schema, dbType)}.${quotedTable}`
|
||||
: quotedTable
|
||||
const sql =
|
||||
dbType === 'mssql'
|
||||
? `SELECT TOP 1000 ${quoteId(fkColumn, dbType)} FROM ${tableRef}`
|
||||
: `SELECT ${quoteId(fkColumn, dbType)} FROM ${tableRef} LIMIT 1000`
|
||||
|
||||
try {
|
||||
const result = await adapter.query(connectionConfig, sql)
|
||||
return result.rows.map((row) => {
|
||||
const r = row as Record<string, unknown>
|
||||
return r[fkColumn]
|
||||
})
|
||||
} catch {
|
||||
return []
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Then row generation just picks randomly from that sampled pool:
|
||||
|
||||
```ts
|
||||
case 'fk-reference': {
|
||||
const fkKey = `${col.fkTable}.${col.fkColumn}`
|
||||
const ids = fkData.get(fkKey) ?? []
|
||||
if (ids.length === 0) return null
|
||||
return ids[Math.floor(Math.random() * ids.length)]
|
||||
}
|
||||
```
|
||||
|
||||
Two design calls worth defending.
|
||||
|
||||
**It samples 1000, not all.** On a 5-million-row `users` table, reading
|
||||
every ID to pick from takes minutes. Sampling a thousand gives you enough
|
||||
variety that your 10,000 generated `orders` rows will reference a
|
||||
reasonable spread of users without being a perfect distribution. Perfect
|
||||
distributions are for statisticians; believable demos are for everyone
|
||||
else.
|
||||
|
||||
**It returns an empty array on error, silently.** If the parent table
|
||||
does not exist, or the column has been renamed, or you do not have
|
||||
SELECT on it, we fall back to NULL in the generated column. I go back and
|
||||
forth on whether this should be a hard error instead. In practice it is
|
||||
the right default for demos — you can still generate the rest of the
|
||||
columns and fix the FK column after — but I plan to add a visible warning
|
||||
indicator for it.
|
||||
|
||||
**The generator is one table at a time, not the whole database.** A "seed
|
||||
the whole DB in dependency order" mode would require a topological sort
|
||||
of the foreign-key graph, and the right UX for it is not obvious. Right
|
||||
now the workflow is: generate the parent tables first (users, products),
|
||||
then the child tables (orders, line_items) with FK-references pointing
|
||||
back. It is an extra step but it keeps the mental model tiny.
|
||||
|
||||
## Guarding against prototype pollution
|
||||
|
||||
Here is a thing I did not expect to care about when I started. The
|
||||
`fakerMethod` string looks like `internet.email` and I call it
|
||||
dynamically:
|
||||
|
||||
```ts
|
||||
function callFakerMethod(method: string): unknown {
|
||||
const parts = method.split('.')
|
||||
if (parts.length !== 2) return faker.lorem.word()
|
||||
|
||||
const [ns, fn] = parts
|
||||
if (ns === '__proto__' || ns === 'constructor' || ns === 'prototype') return faker.lorem.word()
|
||||
if (fn === '__proto__' || fn === 'constructor' || fn === 'prototype') return faker.lorem.word()
|
||||
|
||||
const fakerAny = faker as unknown as Record<string, unknown>
|
||||
const namespace = fakerAny[ns]
|
||||
if (!namespace || typeof namespace !== 'object') return faker.lorem.word()
|
||||
|
||||
const func = (namespace as Record<string, unknown>)[fn]
|
||||
if (typeof func !== 'function') return faker.lorem.word()
|
||||
|
||||
const result = (func as () => unknown).call(namespace)
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
The `__proto__` / `constructor` / `prototype` checks are there because the
|
||||
`fakerMethod` value comes from the renderer, which means it ultimately
|
||||
comes from user input in the generator UI. Without the guards, someone
|
||||
could enter `__proto__.valueOf` as their method name and get, at best, a
|
||||
crash and, at worst, prototype pollution across the whole main process.
|
||||
Is it exploitable in a single-user desktop app? Probably not. Did I add
|
||||
it anyway? Yes — because the code looked dangerous in review and "probably
|
||||
not exploitable" is not a principle I want the codebase to live by.
|
||||
|
||||
## Batching and cancellation
|
||||
|
||||
Ten thousand rows is nothing. A hundred thousand starts to hurt. The
|
||||
batch inserter (`src/main/batch-insert.ts`) chunks the rows into
|
||||
batches the user configures, sends progress back over IPC after each
|
||||
batch, and honors a cancel flag:
|
||||
|
||||
```ts
|
||||
ipcMain.handle('db:generate-cancel', async () => {
|
||||
cancelDataGen = true
|
||||
requestCancelBatchInsert()
|
||||
return { success: true }
|
||||
})
|
||||
```
|
||||
|
||||
The progress callback (`sendProgress`) updates a progress bar in the
|
||||
renderer between batches. "Cancel" sets the flag, the current batch
|
||||
finishes, and then the loop bails out before starting the next one.
|
||||
Nothing magical, but it means you can start a 500,000-row generation,
|
||||
realize you picked the wrong column mapping, and stop without waiting.
|
||||
|
||||
## Preview mode
|
||||
|
||||
Before committing, the same pipeline runs with `rowCount: 5` and returns
|
||||
the preview rows instead of inserting:
|
||||
|
||||
```ts
|
||||
const previewConfig = { ...genConfig, rowCount: 5 }
|
||||
const rows = generateRows(previewConfig, fkData)
|
||||
return { success: true, data: { rows } }
|
||||
```
|
||||
|
||||
This alone has saved me from maybe twenty bad seed runs. "Oh, the email
|
||||
column is getting lorem.word because I forgot to override it" — caught in
|
||||
the preview, fixed, re-previewed, then committed.
|
||||
|
||||
## What I'd do differently
|
||||
|
||||
**A topological-sort mode for seeding a whole schema.** The current
|
||||
table-at-a-time model is fine for small datasets; for end-to-end test
|
||||
fixtures it is annoying. A mode that takes a schema, orders the tables
|
||||
by FK dependency, and seeds them all with sensible defaults is the
|
||||
obvious next step.
|
||||
|
||||
**Better heuristic for numeric foreign keys.** If a column is named
|
||||
`owner_id` and there is no declared FK but there is a `users.id` column
|
||||
in the same schema, we could offer a suggestion. Right now we only use
|
||||
declared foreign keys, so schemas without formal FK constraints (hello,
|
||||
legacy MySQL) miss out.
|
||||
|
||||
**Locales.** Faker supports locales; data-peek just uses the default.
|
||||
Generating data for a Japanese demo app and getting all-American
|
||||
addresses is a dead giveaway. Adding a locale picker is a small change I
|
||||
keep forgetting to do.
|
||||
|
||||
## Try it
|
||||
|
||||
Open a table in data-peek, click Generate Data, hit Preview, then
|
||||
Generate. The whole thing is at [data-peek.app](https://data-peek.app).
|
||||
The generator code is in `src/main/data-generator.ts` and
|
||||
`src/main/batch-insert.ts`, and the UI is
|
||||
`src/renderer/src/components/data-generator.tsx`. MIT source, free for
|
||||
personal use.
|
||||
|
||||
The pitch: the next time someone asks you for a demo dataset in twenty
|
||||
minutes, you do not have to open a fresh `seed.ts` file.
|
||||
228
notes/listen-notify-without-tears.mdx
Normal file
228
notes/listen-notify-without-tears.mdx
Normal file
|
|
@ -0,0 +1,228 @@
|
|||
---
|
||||
title: "Debugging Postgres LISTEN/NOTIFY Is Finally Pleasant"
|
||||
description: "A dedicated pub/sub panel for Postgres LISTEN/NOTIFY: SQLite-backed event history, exponential backoff reconnects, multi-channel subscriptions, and a send button that replaces the throwaway Node script you keep rewriting."
|
||||
date: "2026-04-11"
|
||||
author: "Rohith Gilla"
|
||||
tags: ["postgres", "realtime", "node", "tutorial"]
|
||||
published: true
|
||||
---
|
||||
|
||||
I have written the same 40-line Node script to debug a Postgres `LISTEN`
|
||||
channel at least six times. You know the one:
|
||||
|
||||
```js
|
||||
const { Client } = require('pg')
|
||||
const c = new Client({ connectionString: process.env.DB })
|
||||
await c.connect()
|
||||
await c.query('LISTEN order_events')
|
||||
c.on('notification', (m) => console.log(new Date(), m.channel, m.payload))
|
||||
console.log('listening...')
|
||||
```
|
||||
|
||||
Save it as `listen.js`, `node listen.js`, stare at the terminal, trigger the
|
||||
thing you are debugging, and pray the connection does not drop before the
|
||||
event arrives — because if it does, you scroll back up, Ctrl+C, re-run, and
|
||||
now the event has already happened and nobody is listening.
|
||||
|
||||
I rewrote this script on at least four different laptops. On two of them, I
|
||||
rewrote it twice because I forgot where I put it. At some point you have to
|
||||
stop.
|
||||
|
||||
The Postgres LISTEN/NOTIFY panel in data-peek is the version I wish I had had
|
||||
the first time.
|
||||
|
||||
## What it does
|
||||
|
||||
- Subscribe to one or many channels on a connection with a single input.
|
||||
- See every event that arrives, live, with timestamp, channel, and payload.
|
||||
- Keep the last 10,000 events *per connection* in a local SQLite database,
|
||||
so if you walk away and come back an hour later, the history is still
|
||||
there.
|
||||
- Survive a dropped connection. If the Postgres server restarts, or your
|
||||
laptop's WiFi blips, the listener reconnects with exponential backoff and
|
||||
re-subscribes to everything you had open.
|
||||
- `NOTIFY` back with a "Send" button, so you can smoke-test your own channel
|
||||
without leaving the app.
|
||||
|
||||
The whole thing is one `pg.Client` per connection held open in the Electron
|
||||
main process. The renderer just sends subscribe/unsubscribe IPC messages and
|
||||
receives events through a webContents channel.
|
||||
|
||||
## The reconnect loop
|
||||
|
||||
If you have written a long-lived pg listener before, you know the three
|
||||
things that go wrong: the client drops, the tunnel dies, or your backoff is
|
||||
too aggressive and you hammer a recovering server. Here is the relevant slice
|
||||
from `src/main/pg-notification-listener.ts`:
|
||||
|
||||
```ts
|
||||
const MAX_BACKOFF_MS = 30_000
|
||||
const MAX_EVENTS_PER_CONNECTION = 10000
|
||||
|
||||
function scheduleReconnect(connectionId, config, channels, backoffMs) {
|
||||
const entry = listeners.get(connectionId)
|
||||
if (entry?.destroyed) return
|
||||
|
||||
const nextBackoff = Math.min(backoffMs * 2, MAX_BACKOFF_MS)
|
||||
|
||||
const timer = setTimeout(() => {
|
||||
const current = listeners.get(connectionId)
|
||||
if (current?.destroyed) return
|
||||
connectListener(connectionId, config, channels, nextBackoff)
|
||||
}, backoffMs)
|
||||
|
||||
if (entry) entry.reconnectTimer = timer
|
||||
}
|
||||
```
|
||||
|
||||
The base delay is 1000ms, we double on every failure, and we cap at 30
|
||||
seconds. That cap matters more than you think — without it, after enough
|
||||
failures you end up with a listener that tries once an hour and you swear
|
||||
the panel "doesn't work anymore" when actually it just happens to be in
|
||||
a two-hour retry gap.
|
||||
|
||||
The `destroyed` flag is the quiet hero. Every `ListenerEntry` has one:
|
||||
|
||||
```ts
|
||||
interface ListenerEntry {
|
||||
client: Client
|
||||
tunnelSession: TunnelSession | null
|
||||
channels: Set<string>
|
||||
connectedSince: number
|
||||
reconnectTimer?: ReturnType<typeof setTimeout>
|
||||
destroyed: boolean
|
||||
}
|
||||
```
|
||||
|
||||
When the user closes the panel or switches connections, I set `destroyed =
|
||||
true` before calling `client.end()`. That matters because `client.end()`
|
||||
triggers the `'end'` event, which would otherwise kick off a reconnect
|
||||
attempt three seconds later — a stubborn zombie listener that refuses to die.
|
||||
The check `if (entry.destroyed) return` at the top of the reconnect branch
|
||||
is what makes the cleanup actually clean up.
|
||||
|
||||
Both the `'error'` and `'end'` handlers route through the same reconnect
|
||||
path:
|
||||
|
||||
```ts
|
||||
client.on('error', (err) => {
|
||||
if (entry.destroyed) return
|
||||
log.error(`pg notification client error for ${connectionId}:`, err)
|
||||
scheduleReconnect(connectionId, config, entry.channels, backoffMs)
|
||||
})
|
||||
|
||||
client.on('end', () => {
|
||||
if (entry.destroyed) return
|
||||
log.warn(`pg notification client disconnected for ${connectionId}, reconnecting...`)
|
||||
scheduleReconnect(connectionId, config, entry.channels, backoffMs)
|
||||
})
|
||||
```
|
||||
|
||||
Note that I re-subscribe to `entry.channels`, not to the original `channels`
|
||||
parameter. If the user added a channel since the initial connect, this makes
|
||||
sure the reconnect picks up the new set. I did not have this right the first
|
||||
time; I lost a channel on every reconnect until I noticed.
|
||||
|
||||
## Event history, in SQLite
|
||||
|
||||
Events are not just forwarded to the renderer — they also go into a local
|
||||
SQLite database in the user's Electron data dir:
|
||||
|
||||
```ts
|
||||
sqliteDb.exec(`
|
||||
CREATE TABLE IF NOT EXISTS pg_notification_events (
|
||||
id TEXT PRIMARY KEY,
|
||||
connection_id TEXT NOT NULL,
|
||||
channel TEXT NOT NULL,
|
||||
payload TEXT NOT NULL,
|
||||
received_at INTEGER NOT NULL
|
||||
)
|
||||
`)
|
||||
|
||||
sqliteDb.exec(`
|
||||
CREATE INDEX IF NOT EXISTS idx_pne_connection_received
|
||||
ON pg_notification_events (connection_id, received_at DESC)
|
||||
`)
|
||||
```
|
||||
|
||||
Three reasons this matters:
|
||||
|
||||
1. **Close the panel, come back later.** Your events are still there. This
|
||||
alone is worth the whole feature — I used to screenshot my terminal
|
||||
before closing the throwaway script.
|
||||
2. **The ring-buffer cap is enforced in SQL, not in JS.** On every insert I
|
||||
count the rows for that connection and DELETE the oldest if we exceed
|
||||
10,000. That keeps the JS side completely stateless.
|
||||
3. **It survives app restarts.** A SQLite file in `app.getPath('userData')`
|
||||
is the simplest possible durable store.
|
||||
|
||||
The trim query is worth sharing because it is the one where I almost used a
|
||||
`LIMIT` on a DELETE (which SQLite supports but is a footgun):
|
||||
|
||||
```ts
|
||||
if (count > MAX_EVENTS_PER_CONNECTION) {
|
||||
const excess = count - MAX_EVENTS_PER_CONNECTION
|
||||
db.prepare(`
|
||||
DELETE FROM pg_notification_events
|
||||
WHERE id IN (
|
||||
SELECT id FROM pg_notification_events
|
||||
WHERE connection_id = ?
|
||||
ORDER BY received_at ASC
|
||||
LIMIT ?
|
||||
)
|
||||
`).run(event.connectionId, excess)
|
||||
}
|
||||
```
|
||||
|
||||
The subquery guarantees we delete the oldest `excess` rows for *this
|
||||
connection*, not the oldest rows overall. On a laptop with two databases
|
||||
open, the global ordering would let one chatty channel evict events from
|
||||
the quiet one.
|
||||
|
||||
## Identifier quoting for channel names
|
||||
|
||||
This one is easy to miss. `LISTEN` takes an identifier, not a string literal,
|
||||
so you cannot parameterize it with a `$1` placeholder. You have to interpolate
|
||||
— carefully:
|
||||
|
||||
```ts
|
||||
function quoteIdent(name: string): string {
|
||||
return `"${name.replace(/"/g, '""')}"`
|
||||
}
|
||||
|
||||
await client.query(`LISTEN ${quoteIdent(channel)}`)
|
||||
```
|
||||
|
||||
Postgres identifier quoting uses `""` to escape a literal double-quote inside
|
||||
a quoted identifier. If your channel name is `"foo"bar"`, that becomes
|
||||
`"""foo""bar"""` after quoting, which Postgres parses back to `"foo"bar"`.
|
||||
It is ugly but it is the only correct way. A channel named `order; DROP TABLE
|
||||
users;` would otherwise be a live SQL injection.
|
||||
|
||||
## What I'd do differently
|
||||
|
||||
**The per-connection ring buffer should be per-channel.** Right now one
|
||||
chatty channel on a shared connection can push a quiet channel's history out
|
||||
the back of the buffer. Partitioning the cap by `(connection_id, channel)`
|
||||
would fix it. I did not because it would complicate the trim query and
|
||||
nobody has hit the limit in practice yet.
|
||||
|
||||
**I should persist subscriptions across app restarts.** Today, if you close
|
||||
data-peek and re-open it, your subscribed channels are gone. The SQLite DB
|
||||
has the *events* but not the *subscriptions*. A small `pg_subscriptions`
|
||||
table keyed by connection_id would let the app restore on startup. It is
|
||||
on the list.
|
||||
|
||||
**Reconnect should jitter.** 1s, 2s, 4s, 8s is a textbook exponential
|
||||
backoff, but if two clients lose their connection at the same instant they
|
||||
reconnect in lockstep. Adding a random 0–500ms jitter would be three lines
|
||||
and would be the right thing to do.
|
||||
|
||||
## If you want to try it
|
||||
|
||||
It lives in `src/main/pg-notification-listener.ts` if you want to read the
|
||||
full 300 lines. The panel UI is `src/renderer/src/components/pg-notifications-panel.tsx`.
|
||||
|
||||
data-peek is at [data-peek.app](https://data-peek.app). Free for personal
|
||||
use, MIT source. If you have ever rewritten the listen.js script, you will
|
||||
know exactly why this exists.
|
||||
Loading…
Reference in a new issue