April 2, 2025

How Pub/Sub for Tables Fixes What Data Pipelines Broke

Data pipelines solved data movement - but broke collaboration, trust, and agility

For over a decade, the standard playbook for enterprise data integration has been clear:

Build more pipelines → Integrate more data → Deliver more value. I’ve seen this up close. Years ago, I co-founded StreamSets, one of the first platforms designed to make data pipelines easier to build and operate.

And for a while, it worked. We connected systems. We moved data faster. We enabled new analytics use cases. But pipelines quietly eroded something far more important: organizational collaboration, trust in data, and the ability to move with agility.

Pipelines created fragile, point-to-point dependencies. They fragmented ownership and accountability. They made data teams reactive, trapped in an endless cycle of firefighting and maintenance.

We compensated by adding more tooling. Observability stacks, data quality tools, and governance frameworks were all introduced to restore visibility and trust, but only after the system had already broken. New architectures promised relief but couldn’t fix the foundation. Data Mesh, Data Fabric, centralized governance platforms – all well-intentioned, but ineffective when built on top of brittle, opaque pipelines.

It’s time to rethink how data flows in an enterprise. That’s where Pub/Sub for Tables comes in. A model that restores agility, trust, and efficiency – not by adding more layers, but by replacing the brittle foundation itself.

The Complexity Spiral: How Pipelines Created Dysfunction

Pipelines solved data movement but created a spiral of growing complexity. They were introduced to move data between systems, and they did that job well. But with every new pipeline, the system became more fragile, opaque, and harder to manage.

The more pipelines you built, the more hidden dependencies you created. Each integration increased operational overhead and tight coupling between teams. This complexity grew quietly, like a pot slowly coming to a boil. It didn’t feel like a problem at first — but over time, it consumed time, budgets, and organizational clarity.

Pipelines delivered data, but not what that data represented. They reliably moved records from one system to another, but stripped away the meaning, structure and context that made the data useful.

The three dimensions of data that make it fit for purpose were missing:

  • Data Value: The raw facts and records.
  • Metadata: The structural and operational context.
  • Semantic Interpretation: The meaning, purpose, and intended use of the data.

Pipelines only transported the data value, and left the other two dimensions behind.

Metadata was approximated after the fact. Often incomplete, inconsistent, or stitched together from logs, configs, and tribal knowledge.

Semantic interpretation was completely missing. Pipelines offered no mechanism to carry the meaning, intent, or contractual understanding between data producers and data consumers.

This gap forced teams to build additional systems to reconstruct context. Observability stacks, data catalogs, and coordination meetings became mandatory.

Every new pipeline added short-term progress but long-term fragility. The result was an ecosystem of silos, misalignment, and spiraling complexity — all driven by a model that couldn’t carry the full context of data.

Why the Promise of Data Architectures Remains Unfulfilled

The industry responded to pipeline complexity by introducing new data architectures, but the promise of these architectures has been hard to realize.

Data Mesh, Data Fabric, centralized catalogs, and governance frameworks all emerged with the right intent: to restore clarity, ownership, and trust in how data flows across the enterprise. But they were built on top of a brittle, opaque foundation. They were layered over fragile pipelines that couldn’t carry meaning, trust, or alignment in the first place. These architectures attempted to compensate for foundational gaps — but couldn’t fix them.

Layered-on metadata solutions tried to reclaim semantics — but always too late. Most metadata platforms were reactive. They attempted to capture meaning after the data had already moved — when the context was already lost. The result was lagging semantics and incomplete metadata. No matter how sophisticated the catalog or governance layer, it was always one step behind the actual data.

Top-down mandates further undermined agility. Many governance initiatives imposed rigid classification and control from above — often clashing with the autonomy and speed that domain teams needed to deliver business value. This tension between central governance and domain agility made adoption hard — and outcomes inconsistent.

The underlying issue was never addressed. You can’t govern your way out of brittle, opaque data movement. When the plumbing itself is fragile, adding more architectural layers only increases complexity and slows everything down. The promise of these architectures will remain unfulfilled until the foundation itself — how data is shared and governed at the source — is reimagined.

That’s what Pub/Sub for Tables makes possible.

A New Model: Pub/Sub for Tables

The way out of this spiral isn’t adding another layer of tooling on top – it’s a different model altogether.

Pub/Sub for Tables applies a proven principle — Publish/Subscribe communication — to datasets, not events or individual records.

Here’s how it works:

  • Data producers publish well-defined, governed datasets (tables).
  • Data consumers subscribe to these datasets declaratively, based on contracts that guarantee structure, semantics, and quality.
  • No fragile pipelines, no handoffs, no breakage chains.

The relationship between producers and consumers becomes clear, contract-driven, and self-service. This model flips the equation:

  • From reactive integration to proactive publishing
  • From fragmented ownership to clear accountability
  • From implicit handoffs to explicit contracts
  • From after-the-fact governance to governance by design

Pub/Sub for Tables carries all three dimensions of data by design:

  • Data value: Published in standardized, governed tables.
  • Metadata: Explicit, structured, and versioned.
  • Semantic interpretation: Defined contractually, ensuring shared understanding.

No need for afterthought governance. The context and meaning travel with the data itself.

What Pub/Sub for Tables Unlocks

This shift isn’t just technical, it transforms how your organization operates around data. Here’s what Pub/Sub for Tables enables:

Simplified, automated data engineering workflows: Integration becomes declarative and self-service. Data engineers move from pipeline firefighting to higher-value work.

Enforceable, transparent data contracts: Contracts between producers and consumers become explicit, versioned, and trustworthy.

Reusable, governed data products: Publishing contracts naturally evolve into discoverable, reusable datasets with clear ownership and semantics.

Governance and oversight without friction: Policies are embedded at the point of publication, making compliance transparent and lightweight.

A scalable, trusted data foundation for AI and advanced analytics: Clean, governed, contract-driven datasets become the backbone of enterprise AI and decision-making.

What’s Next – The Mini-Series

This is just the starting point. Over the coming weeks, I’ll go deeper into how Pub/Sub for Tables transforms key areas of enterprise data strategy:

If your data teams feel stuck in a cycle of complexity, misalignment, and constant maintenance, this series is for you.

It’s time to stop duct-taping data ecosystems together — and start building an enterprise data foundation that works.