Your data stack is about to get a lot more contributors

Apr 7, 2026

Yes, AI is transforming the way business users query and consume data. But it's also fueling a less discussed shift: business users are starting to contribute directly to dbt models, metric definitions, and Looker views. AI tools are making it possible, and they can't wait for tickets anymore.

That's not a bad thing. These are the people who know how a campaign is actually attributed, which customers count as "active," what finance means by "net revenue" versus what product means. Getting that knowledge into the data stack directly is something data teams have wanted for years. The problem is that it's happening before anyone has figured out how to govern it. A head of analytics at a 100-person company recently told us she now runs workshops titled "GitHub: what is it?" for the marketing and operations people submitting pull requests.

AI creates contributors, not just consumers

Most of the conversation around AI and data focuses on consumption: analytics agents, natural language queries, better answers faster. There's broad agreement that context is the key enabler, that agents need the right business definitions and metric logic to query accurately. Teams and vendors are working on consolidating that context and feeding it to AI agents.

The contribution side gets less attention. AI doesn't just help people ask questions. It helps people who aren't on the data team propose changes to dbt models, metric definitions, Looker views, and semantic-layer configurations. And not every contribution carries the same risk.

A marketing analyst can't find an attribution metric that matches their use case, so they build a new one. That's a duplication risk: now two definitions exist for roughly the same concept, and nobody downstream knows which one to trust. A RevOps lead modifies an existing pipeline metric to match a new comp structure. That's a breaking-change risk: dashboards, reports, and agents that depend on the old definition silently shift. A product manager asks Claude to scaffold a new explore from a description of what they need. That's low individual risk, but fifty of those with no cleanup path is a maintenance problem.

This isn't yet universal. Most data teams aren't drowning in PRs from product managers. But at companies where AI tooling is moving fast, the shift is already visible. A head of analytics at a gaming company described where this is going: "You won't be able to frustrate them anymore because they'll have access to practically everything. It'll be open bar."

Today, data teams have pieces of the guardrail stack: Git review for code changes, semantic definitions in dbt or LookML, ownership metadata in catalogs, content validation in BI tools. These pieces are scattered across different systems, but they work when the data team reviews every PR, when contributors know the conventions, and when volume stays low enough for human gatekeeping. The data team is the bottleneck, and that's fine, because the bottleneck is also the quality gate.

That model is built for a world where contribution comes from a small group of technical people. When volume grows and contributors don't share the conventions, the gate doesn't hold.

What open bar looks like in practice

We've spent the last four months talking to data teams at companies ranging from 40-person startups to 15,000-person enterprises: dbt-native platform teams, Looker-centric analytics groups, mixed environments running parallel BI stacks. The failure modes are organizational rather than technical.

No inventory of what already exists. Most companies don't have a current, searchable map of their metric definitions and business logic. Catalogs exist but are often stale or incomplete. So when a new contributor needs a metric, they build one, even if a version already exists somewhere in the stack. The same concept gets modeled in slightly different ways by different people, and nothing flags the overlap.

No shared definition of a valid contribution. When three technical people are contributing, implicit norms work. Everyone knows how models are named, where definitions live, when to create something new versus extend what's there. Those norms don't transfer to twenty contributors of varying technical backgrounds. And if the data team steps in to review every PR manually, the contribution model hasn't reduced their workload. It's relocated it.

No visibility into the blast radius of a change. Marketing proposes a new "conversion rate" metric for campaign attribution. Finance already has a "conversion rate" tied to pipeline. RevOps reuses neither and builds their own in a dashboard calculation. Six months later, an AI agent gets asked "what's our conversion rate?" and serves whichever definition it encounters first. Nobody knows there are three. Nobody knows they disagree. A wrong metric doesn't throw an error. It just quietly gives someone the wrong number. This is how board metrics drift, how incentive comp diverges from finance reporting, how operational workflows fire on the wrong customer segments.

No lifecycle after creation. Concepts tied to OKRs evolve every six months, but the models built around them don't. They sit in the stack, still referenced by dashboards and agents, with no owner and no usage signal. Nobody can tell whether a definition is still current, still in use, or should be deprecated. Creation is easy. Knowing when to update, merge, or retire something, and what breaks when you do, is nobody's job.

Governing meaning is harder than governing code

The natural instinct is to borrow from software engineering: code review, CI/CD, test suites, ownership models. But data governance is harder than software governance for a specific reason: data teams govern logic they don't fully own. "Active customer" isn't a function with a spec. It's a negotiated agreement between product, finance, and growth, and it shifts when the business does.

Semantic layers, data catalogs, and metrics stores each tackle pieces of the structure. They help standardize how definitions are written, version-controlled, and served to downstream tools. But structure for individual contributions is not the same as an operating model for reviewing, reconciling, and retiring business logic across contributors who each see a different slice of the picture.

Today, everyone is improvising. What's next?

The companies we've talked to are all improvising: workshops, naming conventions, ad-hoc rotation schedules where one analyst per week does nothing but field questions and review contributions. Every week there's a new thing, and it's pretty intense.

The infrastructure that's missing isn't another catalog or another semantic layer. It's the governing layer between them: something that knows what definitions exist across systems, who owns them, who's proposing changes, and what depends on what. The next generation of data tooling will need to govern contribution, not just consumption.

Matthieu Blandineau

‹ The semantic layer is for humans, not machines

There's still no free lunch in data analytics ›

We're building something new. Sign up to hear from us first.