Databricks sells “one copy” in LTAP. Engineers say physics still makes two shapes of data.

The company’s unified OLTP and OLAP story hinges on what counts as “copy,” and rivals are calling it marketing math.

ByYousef Al-ZahraniTechnology Correspondent, The Executives Brief

about 5 hours ago·5 min read

Databricks sells “one copy” in LTAP. Engineers say physics still makes two shapes of data.

Executive summary

Databricks is rolling out LTAP (lake transactional/analytical processing) using Reyden and Lakebase to unify transactional and analytical workloads atop open object storage. For decision-makers, the debate over “zero copy” affects how confidently you can expect performance, cost, and architectural clarity as AI workloads surge.

Databricks is marketing LTAP as “One data, zero compromises, zero copies.” In plain English: it wants you to believe that OLTP and OLAP can live in the same lakehouse without duplicating data. Underneath, the story gets a lot more complicated, and not in a cute theoretical way. Engineers and a Databricks rival argue that LTAP technically involves two physical representations that have to stay consistent, even if there is only one “authoritative” version.

So what exactly is Databricks claiming, and what do the details show? LTAP is built on Reyden, a new compute engine, and Lakebase, Databricks’ serverless PostgreSQL that runs on open object storage. Databricks says it unifies transactions and analytics by unifying data at the storage layer, so “one copy of storage in the data lakehouse” can support transactional, analytical, streaming, and operational workloads. But depending on how you define “copy,” the implementation can look like more than one physical copy for a given query path.

To understand why this argument matters now, you have to zoom out. Databases have always had a split personality: OLTP does small, row-oriented reads and frequent writes, while OLAP does large, column-oriented reads and batch writes. Getting them to coexist in one system is hard at the physical level, because the storage layout and access patterns pull the architecture in opposite directions. That tension is getting louder as the database market chases new workloads tied to AI agents, in both software development and business applications. When those agents read, write, and analyze in tight loops, the appeal of “one place” for data becomes obvious.

Databricks’ LTAP is trying to solve the layout problem by combining PostgreSQL transactional behavior with analytical reading over lake formats. The transactional side leans on Lakebase, which is based on technology from Neon. Databricks bought Neon last year to support copy-on-write branching and autoscaling serverless compute. In the Neon-style setup discussed by engineers and conference materials, PostgreSQL can keep its pages in a pageserver format as local storage, then propagate to object storage in Parquet for long-term durability and columnar querying. If the system needs “cold” data, PostgreSQL/Lakebase can retrieve from the object store and reconvert the Parquet data back to the pageserver format. That is the core engineering move behind Databricks’ “unification”: OLTP-style storage and OLAP-style storage are connected through a common pipeline.

Here is where the “one copy” marketing collides with how engineers describe implementations. A commenter from a Databricks rival quipped “Two copies of data, not one,” and a Databricks engineer reportedly clarified on a private messaging community that it is technically two, because pageservers act as a cache or materialization layer in the Neon architecture. The analytics engine reads PostgreSQL pages from object storage, and it can also involve pageservers, so the path can include both object-store Parquet and pageserver representations. Databricks’ own slides, shared at a PostgreSQL conference in May, reportedly spelled out the split: under “Analytics directly on OLTP data,” Databricks engineers Hristo Stoyanov and Jonathan Katz described how the Spark analytics executor pulls layer files containing full page images from image layers in object storage, while pageservers provide storage for PostgreSQL.

Databricks, for its part, draws a line around the meaning of “copy.” In a statement to The Register, a spokesperson said: “In LTAP, the user only operates on one authoritative copy of the data. [It has] one source of truth data in Iceberg (an open source table format which contains Parquet files). Yes, any database system, even a single individual database, always has many intermediate internal copies of data, ranging from memory L1/L2/L3 cache, to DRAM memory, to non-volatile storage, to blob storage etc. This is referred to as 'the database storage hierarchy.'” The company also qualifies its claim in presentations: there is only one “authoritative” copy, or one copy of the data “in storage” or “in the lake.” The argument is not that internal caching does not exist, but that the system avoids two authoritative, user-facing copies that must be kept in sync.

Rivals are focused on a different distinction: whether the system is truly unified in practice, not just in the definition. SingleStore, which has tried to blend row-store and column-store capabilities with tiered storage, quickly reacted to Databricks’ “HTAP failed” narrative in earlier marketing. SingleStore CTO Nadeem Asghar said in a blog post that you cannot call HTAP a failure and then explain why what HTAP promised is still needed. He argued that renaming it to LTAP changes marketing, not physics. In his view, Databricks’ “one copy” framing is about storage, not about the engine, and he pointed to the existence of different physical shapes, plus multiple engines with their own caches and failure modes. Asghar’s point lands on boards because it highlights a governance and correctness risk: if writes land in a row representation for Postgres and analytics reads a columnar representation, then something must keep them aligned.

This is not an academic debate inside a vacuum. Database unification attempts have a long trail: in 2014, SingleStore started working on an in-memory row store and an on-disk column store with tiered storage, and later launched a cloud service in 2020 with three tiers (memory, local cache, and storage). Other companies explored similar goals: MongoDB adds column-store indexes for analytical queries in apps; Oracle’s HeatWave for MySQL runs on Oracle Cloud Infrastructure to support analytics without exporting to separate systems like Teradata, Snowflake, or AWS Redshift; SAP has talked about real-time analytics since 2011, centered on its in-memory database HANA. Databricks is trying to position lakehouse execution as the latest, more scalable answer.

Finally, some credibility comes from the technical mechanism itself. Andy Pavlo, associate professor of databaseology at Carnegie Mellon University, told The Register that while “they are copying data out eventually,” it is not trivial for Databricks to have the Neon/PostgreSQL front end read writes normally, and then allow Reyden to read those writes. Pavlo said Reyden needs to interpret PostgreSQL page contents, and those pages are not entirely self-contained because permission and metadata can live elsewhere, requiring mechanisms to consult Neon/PostgreSQL catalogs. He also highlighted the hard part: understanding what is allowed to be seen or read from a page when the system intermixes metadata.

For executives, the strategic stakes are simple: when AI agents push databases to do transactional and analytical work in one place, the cost of architectural confusion shows up as delayed roadmaps, messy migrations, and unpleasant surprises under load. The “one copy” debate is therefore less about semantics and more about what your teams should expect from performance, consistency, and operational risk. Databricks’ LTAP may still be impressive engineering, but your board and finance committee will want to know what is truly being duplicated, what is merely cached, and how failure or staleness behaves across the paths your workloads actually take.

Executive ActionsLocked