Keep everything on a local LLM

DBSprout’s --engine spec sends column names and types to an LLM to produce a DataSpec. By default it uses the embedded Qwen2.5-1.5B model, which runs entirely on your machine. This recipe shows how to explicitly pin to a local model — either the embedded one or your own Ollama instance — and explains the privacy guarantees each mode provides.

Problem: you want the semantic accuracy of --engine spec but your organisation policy prohibits sending schema metadata to any external API, or you simply have no internet connection.

Understanding the privacy gradient

DBSprout has three privacy tiers:

Tier	What can leave the machine	Requires
`local`	Nothing	Default — no config needed
`redacted`	Column names and types only (no values)	Cloud LLM + API key
`cloud`	Column names, types, and sample values	Cloud LLM + API key + explicit opt-in

Running with --privacy local (the default) locks the tier to local and refuses to use any provider that makes network calls — you get a hard error rather than a silent leak.

Prerequisites

dbsprout installed
For the embedded model: nothing extra — it ships inside the package
For Ollama: Ollama installed and running, with at least one model pulled (e.g. ollama pull mistral)

Steps

Option A: Embedded Qwen2.5-1.5B (zero config)

1. Generate — the embedded model is the default

dbsprout generate --engine spec --privacy local

The --privacy local flag is technically redundant (it is already the default) but makes your intent explicit and future-proof.

2. Confirm which model ran

dbsprout generate --engine spec --privacy local --verbose

Output will include:

LLM provider  : embedded (qwen2.5-1.5b-instruct-q4_k_m.gguf)
Privacy tier  : local
Network calls : 0
Spec cached   : .dbsprout/cache/spec_a3f9c12.json

Option B: Your own Ollama instance

1. Start Ollama and pull a model

ollama serve          # if not already running as a service
ollama pull mistral   # or any model you prefer

2. Configure DBSprout to use Ollama

Either pass flags:

dbsprout generate --engine spec \
  --llm-provider ollama \
  --llm-model mistral \
  --privacy local

Or persist in dbsprout.toml:

[llm]
provider = "ollama"
model    = "mistral"
base_url = "http://localhost:11434"

[privacy]
tier = "local"

Then just run:

dbsprout generate --engine spec

3. Verify no external calls

dbsprout generate --engine spec --dry-run

--dry-run produces the spec without generating rows and prints the provider summary. If you see any provider other than embedded or ollama, double-check your dbsprout.toml.

Invalidating the cache after a model change

The spec is keyed on schema hash + provider. Switching from embedded to Ollama (or vice versa) generates a fresh spec automatically — no manual cache clear needed.

To force regeneration without changing the provider:

dbsprout generate --engine spec --refresh-spec

Result

Column names and types stay on your machine — the LLM sees only schema metadata, never row values. The spec is generated once, cached, and reused on every subsequent run. You get semantically rich values for domain-specific columns with zero network exposure.

For the full privacy model, see /docs/core-concepts. For provider configuration details, see /docs/guides/llm-configuration.