OPERATING READ $ VALUE ! RISK REVIEWED Transform Govern Fluency
The golden dataset is the working-paper layer behind answer correctness, prompt optimization, drift gates, and the audit memorandum.
Design the golden set -> Open the YAML template ->
A golden dataset for NL-to-SQL is a versioned corpus of representative finance questions with expected answer s, persona, question tier, materiality threshold , tenant context, role context, data dependencies, and failure mode metadata .
A flat list of questions is too weak .
The dataset has to preserve the business context that makes the question material. Without metadata , the score becomes a flat average that hides the failures that matter.
User question
The natural-language question in the language a finance user would actually ask.
Expected answer
The scalar value, table shape, chart shape, caveat, or refusal the system should produce.
Persona and role
CFO, FP&A analyst, controller, budget owner, internal auditor, and the permissions slice being evaluated.
Tier and materiality
The threshold that determines whether a failure becomes an exception or an appendix note.
Data dependencies
The source tables, metric definitions, time window, currency treatment, and reconciliation expectations.
Failure mode if wrong
The business consequence: wrong board number, incorrect budget read, bad vendor decision, or audit limitation.
Start manual before synthetic.
Synthetic variants are useful after ground truth exists. They are dangerous when they become the ground truth. The first seed set should come from domain experts and production-like questions.
Write tier-1 examples with the people who know the finance workflow.
Use synthetic paraphrases only after a known-good answer exists.
Review tenant-specific business logic before marking an example production-ready.
Promote production questions into the golden set after human review, not automatically.
The finance NL-to-SQL record needs audit fields .
The example below extends a generic golden-set pattern with the fields a finance deployment needs for evidence.
Build the golden dataset for NL-to-SQL, answered plainly .
Keep the evidence trail connected .
Golden Set YAML Template
The reusable multi-tenant schema this finance-specific page extends.
Golden record vs golden dataset
The distinction teams need before they build AI evaluation data.
NL-to-SQL evaluation checklist
A readiness checklist for the eval layer around the golden set.
AI Evals
TrustEvals service work for production eval layers, golden sets, and release gates.
Related links and sources
Golden Set YAML Template The reusable multi-tenant schema this finance-specific page extends.
NL-to-SQL evaluation checklist A readiness checklist for the eval layer around the golden set.
AI Evals TrustEvals service work for production eval layers, golden sets, and release gates.
Golden record vs golden dataset The distinction teams need before they build AI evaluation data.