The semantic layer is for humans, not machines
Every major data platform is racing to build semantic layer capabilities. Snowflake launched Semantic View Autopilot. Databricks is rolling out Unity Catalog Metrics. Microsoft opened Fabric IQ via MCP. The pitch is the same everywhere: AI agents need structured context to query your data accurately.
They're not wrong. But that's only one side of the story.
The semantic layer doesn't only become important because AI agents need it. It becomes important because AI agents make it impossible for humans to operate without one. The more you delegate to agents, the more you need a structured, governed record of what your data means, so you can see what definitions they used, discover what questions are even possible, and change the rules when the business changes.
LLMs are improving fast. Six months from now, agents will probably be powerful enough to fetch whatever context they need on the fly: pull schemas, scan Slack threads, piece together how your team measures churn. And the answer might be accurate. But accuracy is only one of the things at stake when you hand data queries to machines.
What used to absorb the problem, and why agents break it
Before analytics agents, there was always a human in the path. Not by design, but by necessity. For recurring questions, a data engineer built a dashboard, encoding business logic they carried in their head: which table to use, which filters to apply, what "active customer" actually means. For new questions, someone on the data team answered directly, translating between what the requester wanted to know and what the data could actually say. Either way, a person who understood the business meaning of the data sat between the question and the answer. Messy, but functional, because ambiguity got caught before it reached a decision.
Agents remove that person from the path. An agent doesn't Slack someone to check which table is current. It doesn't know that the team decided last quarter to exclude partner-referred accounts from the churn metric. It doesn't know the difference unless someone has made that distinction explicit.
The workarounds that absorbed these gaps operated at human speed, on a human number of questions per day. Agents operate at machine speed, on every question anyone thinks to ask. They don't make ambiguity newly dangerous. They make it more frequent and harder to catch.
Decisions need a home
The current conversation around semantic layers focuses on one thing: how do you give agents the right context to generate correct SQL? That's a valid technical question. But it misses the organizational one.
The human in the path was doing more than looking up column names. They were applying judgment. They knew that "active customer" excludes trial accounts, but also that the product team and the finance team define "active" differently, and which definition applies in which context. They aggregated context scattered across dbt models, Slack threads, and meeting notes. And they enforced alignment: they knew who should make the call on a given metric, and they ensured that when finance says "revenue" and product says "revenue," everyone understands whether those are the same number.
With agents in the path instead, there is no more "let me check with David, he'll know."
Current approaches to providing agents with context will let humans see that context too. But what you'll see is a dry snapshot: here's the schema, here's the column description, here's the lineage. That tells you what the metric is. It doesn't tell you why it's computed this way. Take something as common as net revenue: does it include taxes? One controller says yes, another says no, and the definition that ended up in the YAML file reflects whichever team last updated it. The schema captures the formula. It doesn't capture the debate behind it, the decision, or whether that decision is still current.
That context, the why behind a number, is what turns "the agent said so" into something you'd put in front of your VP. It's the reason you instinctively ask a tenured data team member rather than a new joiner who actually read all the documentation.
Most of these decisions still happen ad-hoc: in a meeting, a Slack thread, an analyst's head. They may or may not get written down. They almost certainly don't get maintained. When an agent writes a query, it will use some version of these decisions, whether you recorded them or not. The question is whether you can see which version it used, whether that version is still current, and whether it should be reconsidered.
Definitions, alignment choices, business rules: these need to become a first-class type of data. Visible, versioned, governable. And if they do, something else follows: people can actually navigate them. Agents answer questions, but they don't help you figure out which questions to ask. One data team we met went from 400 well-documented tables to 1,400 in a few years of growth. You can't browse an agentic reasoning chain to orient yourself in that. A structured layer of business meaning gives the people in RevOps, finance, and strategy something to explore before they ever formulate a query.
The semantic layer isn't a context dump for agents. It's the control layer for what your company considers true.
Who owns the meaning?
Most data teams already have fragments of a semantic layer: dbt YAML files, LookML models, catalog descriptions, Confluence pages. The gap isn't that nobody does this work. The gap is that the work is scattered across tools, maintained manually, and decays the moment someone stops tending it. That was tolerable when a human could compensate. It won't be when an agent is the one reading the definition.
Who decides what your data means as agents become the default interface to it? Today, that's still your team. Whether it stays that way depends on whether those decisions have a home, or whether they stay in Slack threads and people's heads, where the next agent won't think to look.
Matthieu Blandineau
We're building something new. Sign up to hear from us first.