Databricks Solves the 'Data Pipeline Problem' Slowing Down Enterprise AI Agents
Databricks has unveiled Lakehouse//RT, a new real-time compute engine that allows AI agents to query massive enterprise data lakes in milliseconds without the need for complex, slow data pipelines.
By Factlen Editorial Team
- Enterprise Data Engineers
- Focused on simplifying architecture and eliminating brittle pipelines.
- AI Application Developers
- Prioritize millisecond latency and fresh data for autonomous agents.
- Cloud & Infrastructure Partners
- Focused on integrating these real-time capabilities into broader enterprise ecosystems and event-driven architectures.
What's not represented
- · Traditional Data Warehouse Vendors
- · Enterprise Cybersecurity Auditors
Why this matters
AI agents are only as smart as the data they can access. By eliminating the delays caused by traditional data pipelines, this breakthrough allows autonomous systems to make instant, accurate decisions based on live enterprise data, unlocking a new tier of real-time automation.
Key points
- Databricks introduced Lakehouse//RT at the Data + AI Summit, a new engine designed to eliminate traditional data pipelines.
- The Reyden compute engine allows AI agents to query massive data lakes with sub-100 millisecond latency.
- By querying data directly on open formats like Delta Lake and Iceberg, enterprises avoid copying data into separate serving databases.
- Real-time data access is considered an existential requirement for autonomous AI agents to make accurate operational decisions.
For decades, enterprise data architecture has been defined by a fundamental and frustrating compromise: organizations could have their data fast, or they could have it massive, but they could almost never have both at the same time. Operational databases were built to handle real-time transactions with lightning speed, ensuring that e-commerce checkouts and financial trades executed instantly. Meanwhile, analytical data lakes and warehouses were designed to store petabytes of historical information, allowing data scientists to uncover long-term trends. Bridging these two distinct worlds required complex, brittle infrastructure that has long been the bane of data engineering teams.
Moving information between the fast operational side and the massive analytical side required specialized "ETL" (Extract, Transform, Load) pipelines. These pipelines physically copied and reformatted the data, a process that inherently introduced latency. Because of this architectural bottleneck, analytical insights were almost always looking at the past. For human analysts generating weekly business intelligence reports or executives reviewing quarterly sales dashboards, a few hours of delay was generally considered an acceptable cost of doing business. The system worked well enough for human-speed decision making.
But the rapid rise of autonomous AI agents has entirely broken that legacy paradigm. A modern AI system designed to reason continuously and execute actions on live enterprise data simply cannot tolerate a pipeline sitting between itself and the information it needs to function. If an AI agent is tasked with dynamically rerouting global supply chains or instantly detecting sophisticated financial fraud, it must see the business exactly as it is in that exact millisecond. An agent acting on data that is even ten minutes old is effectively flying blind, leading to poor operational choices and degraded trust in automated systems.
At the Data + AI Summit in San Francisco on Tuesday, Databricks announced a major architectural shift aimed at permanently collapsing that outdated infrastructure. The company unveiled Lakehouse//RT, a new analytics engine designed to bring real-time query performance directly to massive lakehouse environments. By fundamentally redesigning how data is accessed, the platform promises to eliminate the need for intermediate serving layers, allowing both human users and AI agents to query vast oceans of data with sub-second responsiveness.[1][2]

The announcement directly targets what Databricks executives describe as the "decades-old data pipeline problem." By querying data directly where it lives in the primary storage layer, the platform aims to give AI agents access to fresh, complete, and fully trusted data without the need to copy or move it into specialized, expensive serving databases. This unified approach represents a significant evolution of the lakehouse concept, merging the scale of cloud storage with the speed of an operational database.[1][3]
"Agents need the best data," Databricks executives explained during the summit's keynote presentations, highlighting the critical dependency between data quality and AI performance. "If they're getting stale or wrong data, they act poorly." They argued that traditional enterprise architectures—featuring separate transactional systems, analytical systems, and complex serving layers—are simply not built to support a future where millions of autonomous agents are operating simultaneously across a corporate network.[2]
The technical foundation making this real-time performance possible is a newly developed execution engine called Reyden. Engineered specifically from the ground up for the extreme concurrency and strict latency demands of modern agentic workflows, Reyden allows the Databricks platform to query governed Delta Lake and Apache Iceberg tables directly. This means the engine can scan open-format data files stored in cheap cloud object storage and return answers almost instantly, bypassing traditional compute bottlenecks.[1][3]
The initial performance metrics released by the company suggest a significant leap in enterprise data capability. On standard analytical benchmarks used to evaluate database speed, Lakehouse//RT reportedly delivers sub-100 millisecond latency while handling an impressive 12,000 queries per second. This level of throughput is typically reserved for highly tuned, specialized in-memory databases, not massive analytical data lakes containing petabytes of historical records. Achieving this on open formats is a major engineering feat.[1]

For smaller, highly targeted workloads, the response times can drop even further, reaching as low as 10 milliseconds. Databricks claims that early beta customers testing the new engine have seen up to 16 times better performance compared to their existing, specialized real-time serving architectures. These gains are achieved without the overhead of maintaining parallel systems, offering a compelling value proposition for IT departments looking to streamline their infrastructure.[1][2]
For smaller, highly targeted workloads, the response times can drop even further, reaching as low as 10 milliseconds.
By completely eliminating the need for a separate serving layer, enterprises can theoretically bypass the exorbitant costs, constant synchronization headaches, and severe governance gaps that plague traditional data pipelines. Every time data is copied from a secure lake into a downstream database, it creates a new attack surface and a potential compliance violation. Lakehouse//RT keeps the data in one place, under a single unified governance model, while also reducing the proprietary vendor lock-in associated with specialized real-time databases.[1][3]
The practical implications for enterprise AI development are profound and immediate. Consider an AI agent deployed in a consumer financial services application, tasked with analyzing complex transaction patterns to detect sophisticated fraud rings. If that agent has to wait for data to sync across a traditional ETL pipeline, the fraudulent transaction has already cleared the network. With real-time lakehouse access, the agent can evaluate the transaction against years of historical data in milliseconds, blocking the fraud before the money moves.
Similarly, in the manufacturing and logistics sectors, predictive maintenance agents must continuously monitor live equipment performance sensors and anticipate mechanical failures in real time. A delay in data processing caused by a slow pipeline could mean the difference between a smoothly scheduled preventative repair and a catastrophic, multi-million-dollar assembly line shutdown. Real-time access ensures these agents can trigger alerts the moment an anomaly is detected.

The broader enterprise cloud ecosystem is already moving aggressively to support this real-time, zero-copy architecture. Microsoft, a major strategic partner, highlighted during the summit how joint customers are utilizing Azure Databricks to modernize their sprawling data estates. By leveraging these new capabilities, organizations can scale their AI applications and deploy autonomous agents natively on Azure without the friction of constantly copying data between different Microsoft services.[4]
Event-driven architecture providers are also building deep integrations with the new Reyden engine to maximize its potential. Solace, a prominent enterprise event streaming platform, showcased how it can stream fresh, reliable data from thousands of disparate corporate systems directly into the Databricks Data Intelligence Platform. This seamless connection bridges the gap between real-time operational events and next-generation analytics, ensuring the lakehouse is always perfectly up to date.[5]
Beyond pure speed, this architectural shift also addresses the critical, often-overlooked issue of semantic consistency in enterprise data. When data is physically copied across multiple business intelligence tools, downstream dashboards, and serving databases, the underlying business logic inevitably fragments. A core metric like "monthly active users" or "net revenue" defined in one system might mean something slightly different in another, creating a fractured foundation that severely confuses AI models trying to learn the business.[6]
By keeping all enterprise data securely anchored in a single, governed lakehouse and querying it in real time, organizations can finally maintain a truly unified semantic model. This architectural discipline ensures that when an autonomous AI agent pulls a performance metric to make a routing decision, it is using the exact same mathematical definition as the human financial analyst looking at a quarterly dashboard. Trust in AI requires this level of absolute consistency.[6]

The launch of Lakehouse//RT represents a major escalation in the ongoing, high-stakes battle for dominance over the enterprise AI data layer. As Fortune 500 companies race to deploy generative AI and autonomous agents to remain competitive, the underlying data infrastructure has been exposed as the critical bottleneck. The platform that can serve that data the fastest, safest, and cheapest will likely define the next decade of enterprise software.
Databricks is heavily positioning its open-format approach—relying entirely on open-source standards like Delta Lake and Apache Iceberg—as the ultimate antidote to vendor lock-in. The company is actively contrasting its strategy with competitors who require enterprises to move their data into proprietary, closed-ecosystem formats just to achieve fast query performance. This open philosophy appeals strongly to chief information officers wary of being trapped by a single vendor.[3]
While the initial benchmark numbers are undeniably impressive, the true test for Lakehouse//RT will be its performance in the messy, complex, and highly customized reality of legacy enterprise environments. Delivering millisecond latency on perfectly clean benchmark data is one thing; achieving those same speeds across decades of accumulated, unstructured, and poorly formatted corporate data is an entirely different engineering challenge.
Nevertheless, the bold declaration that the era of the traditional data pipeline is coming to an end marks a significant milestone in the evolution of enterprise technology. For autonomous AI agents to actually fulfill their massive promise of transforming corporate operations, they need to see the business exactly as it is right now—not as it was an hour ago. By solving the latency problem at the storage layer, the industry is taking a massive step toward making real-time AI a practical reality.
How we got here
Pre-2020s
Enterprises maintained strict separation between fast operational databases and slow analytical data warehouses.
Early 2020s
The 'Lakehouse' architecture emerged, unifying data storage but still struggling with the millisecond latency required for real-time applications.
2023-2025
The rise of autonomous AI agents exposed the latency flaw in existing data pipelines, as agents required instant access to massive datasets.
June 16, 2026
Databricks announces Lakehouse//RT at the Data + AI Summit, introducing the Reyden engine to solve the real-time analytics bottleneck.
Viewpoints in depth
Enterprise Data Engineers
Focused on simplifying architecture and eliminating brittle pipelines.
For data engineering teams, the appeal of Lakehouse//RT lies in architectural simplification. Maintaining complex ETL (Extract, Transform, Load) pipelines to move data from operational databases into analytical warehouses is historically one of the most expensive and error-prone aspects of enterprise IT. By querying open table formats directly with millisecond latency, engineers can eliminate the 'serving layer' entirely, reducing compute costs and closing governance gaps where data is copied and potentially exposed.
AI Application Developers
Prioritize millisecond latency and fresh data for autonomous agents.
Developers building agentic AI systems view real-time data access as an existential requirement. An AI agent cannot effectively automate customer service, financial trading, or supply chain routing if it is acting on stale information. This camp argues that traditional data warehouses, which introduce minutes or hours of latency, are fundamentally incompatible with the future of autonomous software. They require infrastructure that allows models to reason over live, production-grade data instantly.
Open-Source Advocates
Value the use of open table formats to prevent vendor lock-in.
A significant faction of the data community champions the use of open standards like Apache Iceberg and Delta Lake. They argue that enterprises should never be forced to move their data into proprietary, closed-ecosystem databases just to achieve fast query performance. From this perspective, the ability to run real-time analytics natively on open formats ensures that companies retain ultimate control over their data, allowing them to swap out compute engines in the future without undertaking massive data migration projects.
What we don't know
- Whether the Reyden engine can maintain its sub-100ms benchmark performance when deployed against messy, unstructured legacy enterprise data.
- How competing data platforms, such as Snowflake, will adjust their architectures to match these new real-time querying capabilities.
- The exact pricing model for Lakehouse//RT once it moves out of its beta testing phase and into general availability.
Key terms
- Data Pipeline (ETL)
- The process of extracting, transforming, and loading data from one system to another, which traditionally introduces delays.
- Lakehouse
- A data architecture that combines the massive storage capacity of a data lake with the structured querying capabilities of a data warehouse.
- AI Agent
- An autonomous software entity powered by AI that can reason over data, make decisions, and execute actions without human intervention.
- Delta Lake / Apache Iceberg
- Open-source storage frameworks that bring reliability and high performance to massive data lakes, preventing vendor lock-in.
Frequently asked
Why do AI agents need real-time data?
Agents make autonomous decisions based on the data they see. If they act on stale or delayed data, they can make incorrect or harmful operational choices.
How does Lakehouse//RT eliminate pipelines?
It uses a new compute engine called Reyden to query massive analytical data tables directly in milliseconds, removing the need to copy that data into a separate, faster database.
Does this require moving to a new data format?
No. Lakehouse//RT works natively on open table formats like Delta Lake and Apache Iceberg, meaning companies don't have to migrate their existing data.
Sources
[1]VentureBeatAI Application Developers
Databricks says it solved the decades-old data pipeline problem that's been slowing AI agents
Read on VentureBeat →[2]SiliconANGLEEnterprise Data Engineers
Databricks declares the end of pipelines with a unified platform for operational and analytical data
Read on SiliconANGLE →[3]DatabricksEnterprise Data Engineers
Real-Time Lakehouse - Databricks
Read on Databricks →[4]MicrosoftCloud & Infrastructure Partners
Azure Databricks at Data + AI Summit 2026 featuring Industry Leaders and Partners
Read on Microsoft →[5]SolaceCloud & Infrastructure Partners
Databricks Data + AI Summit - Solace
Read on Solace →[6]MetaKartaCloud & Infrastructure Partners
Databricks Data & AI Summit 2026 - MetaKarta
Read on MetaKarta →[7]TheCUBEAI Application Developers
Databricks Data + AI Summit 2026
Read on TheCUBE →
More in technology
See all 6 stories →AI Reliability
The Mathematical Cure for AI Hallucinations: How Formal Verification is Making Neural Networks Trustworthy
7 sources
Digital Detox
The 'Slowtech' Revolution: Why Millions Are Downgrading Their Smartphones to Reclaim Their Time
6 sources
Digital Minimalism
The Slowtech Revolution: Why Consumers Are Paying $600 for Phones That Do Less
8 sources
Battery Tech
The Sodium-Ion Breakthrough: How Salt is Making Electric Vehicles Cheaper and Cold-Proof
6 sources
Every angle. Every day.
Get technology stories with full source coverage and perspective breakdowns delivered to your inbox.












