DriftGate · Visual regression as a CI gate

Catch visual drift
before it ships.

Off-brand colours, broken spacing, drifted typography — visual bugs slip past code review and reach production, where they erode trust and turn into costly hotfixes. DriftGate renders every front-end change in a real browser, scores it against your design system, and runs a bounded fix loop until it conforms. Pixel diff and design-token checks are the hard gate; the Claude-vision critic is advisory, so a non-deterministic model can never wrongly fail a build.

~15–30 min → < 60 s
manual visual QA per PR, replaced by an automated gate
Pre-merge
drift is caught in CI — not in production, where fixes cost the most
~$0.01 / run
Haiku + prompt caching keep each check effectively free

The hybrid gate

Two deterministic checks decide pass/fail. The model explains why — it never holds the gate alone.

Hard gate

Pixel diff

Screenshot vs. a committed baseline (Pillow/numpy). Fast, free, reproducible. A size change is itself a regression.

Hard gate

Token assertions

Computed styles checked against design tokens — colour, 8 px spacing, type scale, radius. Exact and explainable.

Advisory

Claude-vision critic

Scores the screenshot against DESIGN_SYSTEM.md (prompt-cached) and proposes CSS fixes. Informs; never blocks alone.

Architecture

One engine, two drivers, two surfaces.

Front-end changeURL or .html
CapturePlaywright — MCP driver (local) · library driver (CI & demo)
Deterministic hard gatepixel diff · token assertions → blocks the build
Claude-vision criticcached design system · forced tool-use → ConformanceReport (advisory)
Bounded fix loopscore → propose CSS patch → apply → re-render → re-score · guardrails: max-iters · threshold · no-improvement
CI gateposts report on the PR · fails on hard-gate
This demoreplays the loop in your browser

Watch it fix a real page

Loading…

samples/saas-landing.html
Baseline

Pre-recorded playback — each step replays the loop’s real CSS patch into the live frame. Zero backend, zero cost. Regenerate from real runs with scripts/record_demo.py.

Run it live with Claude gated

Runs the real loop — Playwright renders, Claude vision scores, the loop proposes CSS — using the API key in your backend .env. Access-code gated so only people you share the code with can spend your tokens. Start the backend with uvicorn visual_qe_loop.api.app:app.