Seed data that
actually
fits your schema.
Point at a database or a schema file. Get realistic, constraint-aware rows. Offline, deterministic, no API key.
Run it however you work.
The CLI is still the default. v0.1.8 adds a Terminal UI for live exploration and a Web Studio for review and handoff. The generation engine is identical — same specs, same determinism, same MIT.
Scriptable.
The original surface. Pipe-friendly. Lives in CI and in your shell history.
Interactive.
Schema browser, per-table progress, live quality panel. Works over SSH.
Reviewable.
FastAPI dashboard with ERD, spec grid, SSE progress, fidelity + detection report.
Schema in. Seed SQL out.
dbsprout reads your real schema and grows rows that fit it — referential integrity included.
-- schema.sql (input) CREATE TABLE authors ( id INTEGER PRIMARY KEY, name TEXT NOT NULL, country TEXT ); CREATE TABLE books ( id INTEGER PRIMARY KEY, author_id INTEGER NOT NULL REFERENCES authors(id), title TEXT NOT NULL, price NUMERIC(6,2), published_on DATE );
-- seeds/002_books.sql (generated) INSERT INTO books (id, author_id, title, price, published_on) VALUES (1, 3, 'The Salt Graves', 14.99, '2021-06-02'), (2, 1, 'Northwind', 9.50, '2019-11-15'), (3, 3, 'A Lantern Year', 22.00, '2023-02-28'), (4, 2, 'Ledger of Tides', 18.75, '2024-04-09'), (5, 1, 'Quiet Apparatus', 11.25, '2022-09-21'); -- author_id values sampled from real authors PKs
Fake seed data lies. This doesn't.
Read the real schema.
No config to start. Point at a live database — SQLite, Postgres, MySQL, SQL Server — or a schema file: SQL DDL, DBML, Mermaid, PlantUML, Prisma.
Foreign keys that resolve.
Tables are topologically ordered, FK columns sample from real parent keys, cycles and self-references are resolved automatically. Validation has to pass before output ships.
Same seed, same rows. Always.
The same --seed (default 42) produces identical output across machines. No internet, API key, or account required. Embed it in CI without secrets.
Four ways to grow.
Start with heuristic — fast, no model required. Bring an LLM when you need column-aware accuracy. Same CLI, swap the flag.
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Engine ┃ Speed ┃ Quality ┃ Use when ┃ ┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ heuristic │ 100K+ rows / sec │ ~80% semantic │ Default. Fast fixtures, no model. │ │ spec │ cached after first run │ high semantic │ Column-aware accuracy via LLM. │ │ statistical │ fast │ distribution-faithful │ You have a real data sample. │ │ finetuned │ cached │ highest │ You trained a LoRA adapter. │ └─────────────┴────────────────────────┴────────────────────────┴──────────────────────────────────────┘
Recipes.
Seed CI test fixtures
Deterministic, fast, reproducible. Drop dbsprout generate into your pipeline.
Re-seed after a migration
Diff the schema, update only what changed. Old rows stay stable.
Match production distributions
Fit a statistical model to a real sample. No PII in your fixtures.
Bulk-load millions of rows
Native COPY / LOAD DATA throughput. Stream directly into the engine.
Works with what you already use.
Four live databases. Five schema-file formats. Same two commands.
dbsprout init --db postgresql://user:pass@localhost:5432/mydb dbsprout generate --rows 500 --dialect postgresql
.sql .dbml .mmd .puml .prisma — The honest questions.
Stop seeding lies.
Install once. Init once. Generate forever — deterministically.