Evaluate the semantic layer before SQL.

Most NL-to-SQL errors are not purely SQL errors. They start when the system retrieves the wrong business meaning for the question.

Semantic layer evaluation measures whether an AI system maps finance schema objects to the right business meaning: metric definitions, synonyms, sample values, tenant overlays, time logic, role context, and permitted data slices.

Why it matters

Schema accuracy is not business accuracy.

A model can select the right table and still use the wrong meaning. Finance metrics vary by tenant, workflow, close state, policy, and role.

Revenue can mean bookings, recognized revenue, or cash received depending on context.
Headcount can include contractors in one workflow and exclude them in another.
Vendor spend can depend on mapping tables that are not visible in the raw schema.
A metric definition can be right globally and wrong for one tenant overlay.
Eval dimensions

The semantic layer needs its own scorecard.

Do not wait for the final SQL answer to discover semantic drift. Evaluate retrieval quality before the SQL writer gets the context.

DimensionWhat to checkEvidence
Metric meaningDoes the system retrieve the right definition?Definition ID, version, owner, and expected usage.
SynonymsDoes natural language map to the right field?Accepted terms, rejected terms, and examples.
Tenant overlayDoes the tenant-specific logic override the generic definition?Overlay version and tenant-specific examples.
Role contextDoes the retrieved meaning match the user's permission slice?Role replay and filtered result expectation.
Operating model

Treat definitions as controlled evidence.

The semantic layer should be editable by the right domain owner, versioned like code, and replayed against the golden set after every material change.

Owner

Every material metric needs a named business owner, not only a column description.

Version

Metric definitions should carry version history so old answers can be reconstructed.

Replay

Semantic edits should trigger golden-set replay for affected personas, tenants, and question tiers.

FAQ

Evaluate the semantic layer before SQL, answered plainly.

No. The schema names tables and columns. The semantic layer maps those objects to business meaning, metric definitions, synonyms, tenant logic, and role context.

Finance metrics often vary by business model or customer. A generic definition can be directionally right and still wrong for a specific tenant.

Create examples where the correct answer depends on a specific metric definition, synonym, tenant overlay, or role slice, then evaluate whether retrieval selects the expected context before SQL generation.

If a finance AI answer can move an operating decision, the evidence behind it needs to be readable after the answer is gone.

Bring one workflow, vendor, or AI portfolio. We will map the evidence needed for finance leaders to fund, ship, or stop it.

Evaluate the semantic layer ->