Training Pipeline
DBSprout can fine-tune a small local model on a sample of your real data, then generate from that adapter with the spec engine — without sending data to a cloud provider.
Install
pip install "dbsprout[llm]"
# plus a training backend: Unsloth (CUDA) or MLX (Apple Silicon)
End-to-End
The top-level command runs all three stages in sequence:
dbsprout train --db postgresql://localhost/myapp --sample-rows 1000 --epochs 3 --output .dbsprout
| Option | Description | Default |
|---|---|---|
--db | Live database URL to sample (env: DBSPROUT_TARGET_DB) | — |
--sample-rows | Rows to sample (≥ 1) | 1000 |
--epochs | Training epochs | — |
--output, -o | Base directory for artifacts | .dbsprout |
--seed | Sampling seed | 0 |
--no-pii-redaction | Disable PII redaction before serialization | false |
--quiet | Suppress progress output | false |
Stages
Run a single stage when you need finer control:
# 1. Stratified sample from a live DB into Parquet
dbsprout train extract --db postgresql://localhost/myapp --sample-rows 1000
# 2. Serialize Parquet samples into GReaT-style JSONL
dbsprout train serialize
# 3. Fine-tune a QLoRA adapter on the serialized corpus
dbsprout train run --epochs 3
- CUDA path uses Unsloth; Apple Silicon uses MLX (auto-detected).
- Output includes a merged GGUF (Q4_K_M) adapter usable by the spec engine.
Privacy
PII values are redacted before serialization by default. Pass
--no-pii-redaction only for non-sensitive data. Differential-privacy SGD
(Opacus) is available on the CUDA backend; the pipeline summary reports the
achieved (epsilon, delta).
Generating From the Adapter
dbsprout generate --engine spec --lora ./.dbsprout/adapter.gguf
Adapters hot-swap without restarting — see Data Generation.