tabsdata.com

April 28, 2025

Part 3: Building Data Products: Turning raw data into governed, reusable assets

By:

Arvind Prabhakar

🔹 This article is part of the ongoing series: How Pub/Sub for Tables Fixes What Data Pipelines Broke.

The Promise of Data Products

Data products are the building blocks of a modern data platform. They are designed to be reusable, discoverable, trustworthy, and aligned with business needs. A good data product is not a curated copy of a system table. It is a governed asset complete with structure, semantics, and quality guarantees; ready for downstream teams to use without guesswork.

The industry has long recognized the need for this shift. Data Mesh positioned data products as the core unit of ownership. Gartner highlighted them as critical to scaling trusted AI. McKinsey emphasized them as the foundation for delivering business value at speed.

Yet data products are notoriously hard to build and sustain. Most data architectures do not enable data products to emerge easily. Instead, organizations go through significant processing of raw data to assemble them after the fact. In doing so, ownership shifts away from domain teams into the hands of analysts or data engineers, people far removed from the day-to-day realities of the domains the products are meant to represent.

No wonder these products rarely deliver on their promise. The result is added complexity, inflated maintenance costs, and a model where true reuse becomes difficult to sustain over time.

What a True Data Product is

A true data product is not just cleaned-up data. It is a governed, purposeful, and reusable asset. It is intentionally designed to serve consumers, carrying structure, meaning, and quality guarantees as part of its core definition. These qualities are embedded from the start, not layered afterward through external tooling or documentation.

A real data product has five essential traits:

Discoverable: It can be easily found and understood by consumers across the organization.
Trustworthy: Structure, semantics, and quality expectations are guaranteed and not left to interpretation.
Self-service: Consumers can access and use it without needing custom integration or manual coordination.
Versioned and governed: Changes are deliberate and traceable, preserving stability for consumers.
Owned: A clear team is responsible for its lifecycle, quality, and evolution.

But the most important trait is purpose. A true data product is anchored in how the business sees itself. It represents business concepts and realities that consumers can trust and build on without needing to reconstruct or reinterpret the meaning from raw data.

Why Traditional Approaches Struggle

Traditional approaches make building data products harder than it needs to be. Most architectures start with raw data ingestion. They collect facts from operational systems and store them in large data lakes or warehouses. The assumption is that with enough processing, transformation, and curation, these raw inputs can eventually be turned into useful products.

In practice, this model creates a long and fragile chain. Ownership shifts from domain teams to centralized engineering or analytics groups. Semantics are lost or approximated as data moves through ingestion pipelines. Quality becomes something that is layered on top, not something that is enforced at the source.

The result is that building a data product becomes a manual and expensive effort. Teams must reconstruct meaning after the fact, applying business rules and interpretations that should have been embedded at the start. Every change upstream ripples through brittle transformations, increasing maintenance costs and slowing down delivery.

Data products were supposed to simplify and accelerate the use of data. Instead, under traditional architectures, they often become high-maintenance artifacts that struggle to stay aligned with the business and rarely deliver the promised agility.

How Pub/Sub for Tables Makes Data Products the Default

Pub/Sub for Tables changes the foundation so that data products are not an afterthought. They emerge naturally as part of the normal flow of publishing and consuming data. With Pub/Sub for Tables, domain teams publish purpose-built datasets that capture high-quality, semantically sound representations of their systems. These tables are structured, versioned, and governed at the point of creation. Quality expectations are validated before the data is made available to any consumers.

Data and business teams can then join these published datasets to produce semantically consistent and high-quality assets that align directly with business concepts. This is how the definition of a data product naturally forms - not through retrospective curation, but as a straightforward extension of the publish-subscribe model. With every refresh of a published table, any downstream data products update automatically, keeping them current and valid without requiring manual intervention or reprocessing. Changes propagate through the system with built-in guarantees, maintaining consistency without adding operational overhead.

As the number of published tables and derived data products grows, Tabsdata’s implementation of Pub/Sub for Tables ensures data correctness at scale. It uses a “read-committed” equivalent of transactional semantics, guaranteeing that even in complex dependency chains, consumers see complete and consistent datasets. Because every published table and every derived data product is versioned with each update, tracing and debugging complex data issues becomes dramatically simpler. Teams can pinpoint the exact source and version of any anomaly, reducing investigation time from days to minutes.

This model delivers the benefits of the medallion architecture, but without the waste. In traditional systems, the silver layer tries to reconstruct the reality of source systems after raw ingestion, and the gold layer tries to produce curated, business-aligned datasets on top. Pub/Sub for Tables eliminates the need to ingest vast quantities of raw data and manually reconstruct meaning at the cost of IO, compute, and engineering effort.

Published tables already represent the clean, structured view of domain systems, much like a silver layer. Data products emerge by joining and composing these tables, producing a gold layer that reflects real business meaning, all without the inefficiencies and guesswork of traditional ingestion models.

Aligning Domains to What Matters

Pub/Sub for Tables shifts the focus of domain teams from data engineering to business clarity. Domain teams are not responsible for building and maintaining data products. They are responsible for running their business functions and capturing the data that reflects their systems of record. By publishing this data as structured tables, they provide the rest of the organization with a clear, reliable view of their business activities.

The act of publishing forces domain teams to ask important questions. What data is relevant to the broader organization? What signals are important to preserve? What details can be left behind? This natural pruning sharpens focus, reduces noise, and ensures that operational changes inside a domain do not ripple unpredictably across the enterprise. Teams regain agility without sacrificing trust.

Published datasets form the vocabulary of each domain. They clarify meaning, eliminate overlaps, and provide a stable foundation for collaboration across teams. Instead of fragmented interpretations, domains work from shared, well-understood definitions that align with how the business actually operates.

Data products built on top of these published tables offer an even broader perspective. They combine the outputs of multiple domains to create transparent, business-aligned metrics. They make it easier to measure effectiveness, spot gaps, and drive collective improvements across the organization.

This is how true alignment emerges. Not through top-down mandates or endless coordination meetings, but through a simple, scalable model where publishing high-quality data becomes a natural extension of running the business itself.

The Strategic Advantage of a Data Product Foundation

A platform built on true data products unlocks scale, speed, and trust across the entire organization. When data products emerge naturally from published tables, they are not isolated artifacts. They become reusable building blocks that feed analytics, operations, AI models, and business dashboards without constant rework. Teams can move faster because they are working with assets that are governed, meaningful, and dependable by design.

Instead of spending time stitching together data from fragmented pipelines, teams can focus on delivering insights and driving decisions. Instead of struggling to understand what the data represents, they can trust that the meaning was preserved at the source. This shift reduces operational overhead and creates a compounding advantage over time.

A foundation built this way also scales cleanly. As more domains publish their data, and more data products are created, the system does not buckle under the weight of complexity. It becomes stronger. Trust grows, collaboration improves, and innovation accelerates because teams are building on shared, reliable assets.

This is the real power of a data product foundation. It does not just make the current workflows better. It transforms what is possible for the organization. Agility increases. Trust deepens. The ability to adapt and compete sharpens.

And it all starts by getting the foundation right.

Closing: Building a Foundation that Scales

Building data products should not be a process that is removed by many processing layers and actors from the origin of the data. It should be the natural outcome of how data is shared across the organization.

Pub/Sub for Tables makes this possible. It gives domain teams a simple, reliable way to publish structured, governed data. It creates a foundation where data products emerge naturally, aligned with business needs, and ready to scale across analytics, AI, and operational use cases.

The result is a platform that moves faster, collaborates better, and adapts more easily to change.

If you are interested in seeing how this model can help your organization build a stronger, more scalable data foundation, I would love to connect.

‍