A historical recollection of the Postgres project and its impact on the DBMS industry provides an insight into the key features of the object-relational database as conceived by Mike Stonebraker.
Looking Back at Postgres, freely available as an arXiv.org pdf, is an essay that also forms part of the just released book "Making Databases Work The Pragmatic Wisdom of Michael Stonebraker". It comes from Joseph M. Hellerstein, a prominent research member of the UC Berkeley Postgres project, which was led by the Stonebraker think tank from the mid-1980's to the mid-1990's, and takes us on a magnificent tour through the evolution of the Postgres project. During that journey, Hellerstein pauses on its milestones to elaborate on its forward thinking that planted the roots of the technologies that shape the database industry today, decades after the project's conception.
At the very beginning of the paper, there's reference to yet another Berkeley project, which I cannot leave unnoticed. It is a reference to Ingres, the cradle of the relational database systems, and one which I still use today.
Not only Postgres was heavily influenced by the 1970s Ingres project, but actually started life as a fork of it. The Genealogy of Relational Databases chart which visualizes the history of the relational databases, shows that Ingres was the very first RDBMS and that circa 1980 Postgres sprung to life as a mere fork of it. It's all in the name after all; Postgres for Post-Ingres, that is, going beyond Ingres's capabilities.
Commercially, things for Ingres didn't turn out as well as they did for Postgres. Today Postgres occupies the number 4 spot on the StackOverflow ranking of the most popular RDMBs and is number 1 among those that are open source. In contrast, Ingres, despite having contributed to the its successor's great success, is noticeable by its absence.
Mike Stonebraker who started out with Ingres, conceived a vision;that of the Object-Relational database which he fulfilled later on when he started working on Postgres. Let's look into the innovative ideas that shaped this vision.
Supporting ADTs in a Database System At the core of the Object-Relational database notion was the support of ADTs or Abstract Data Types that went beyond the traditional ones handled by the database.These were complex objects or data which had to be stored as nested bundles in stark contrast to the relational model's classical data flattening for removing duplication.This began as an attempt to cater for the needs of CAD applications which use data types such as polygons, rectangles or even fully blown objects such as circuit layout engines.The shape that it takes today is JSON, JSONB, or XML.Storing that kind of data was just one part of the story;the other was to also allow for running declarative queries over it.At the time, no one had caught up to the idea but until much later down the line.
That scheme was further expanded by allowing programmers to declare their own User Defined Types and User Defined Functions for working with that kind of types.
Meanwhile, "the Big Data stacks of the 2000s—including the MapReduce phenomenon that gave Stonebraker and DeWitt such heartburn —are a re-realization of the Postgres idea of user-defined code hosted in a query framework"
Extensible access methods for new data types Yet another innovation were the B-Tree indexes that everyone is familiar with today, as well as the R-Tree indexes which allowed for running two dimensional range queries on data. These breakthroughs set the foundation of Postgres's expandability, partially reflected today in Gist indexes and interfaces that power up the famous PostGIS geographic information system.
Active Databases and Rule Systems Rules or triggers pioneered under Ingres were yet another construct popularized by Postgres that found its way into all the major database engines. More on that can can be found in my Connecting To The Outside World with Perl and Database Events article which extensively goes through Rules, Database Events and Stored Procedures under modern Ingres.
Log-centric Storage and Recovery Not fond of the write ahead logging schemes, Stonebraker "unified the primary storage and historical logging into a single, simple disk-based representation" which not only simplified recovery but also enabled time traveling, that is run queries “as of some wall-clock time to access the versions of the data that were committed at that time". While versioning and time traveling queries never got much traction inside the project, finally getting replaced by the write-ahead logging scheme just as the rest of the vendors did, it nevertheless had set the foundations for MVCC as experienced under Oracle's snapshot isolation level.
Support for Multiprocessors: XPRS Sharing memory and processors to support parallel query optimization was yet again another Postgres induced novelty. Although Postgres's probably one and only weak point is that it can't scale out to a parallel shared nothing architecture of clusters by itself, Stonebreaker's paper on "Case for Shared Nothing" fueled the technologies behind Gamma, Terradata and the Big Data era as a whole.But due to Postgres extensibility, products such as Citus database cluster, PostgresForest or Postgres-xc sprung out, that base on vanilla Postgres and render it able to understand and execute parallel queries as a shared-nothing distributed database.
Support for a Variety of Language Models The impedance mismatch between Object Oriented programming languages and the declarative relational model was and still is one of the hottest problems of the computer industry.Instead of getting sucked into this never ending debate, Stonebreaker introduced the Object-Relational Database , totally sidestepping the Object Oriented Databases.Not only do all vendors support this scheme today but yet another student of this prolific project, Margo Seltzer, introduced the notion of the key value stores attributing to the NoSQL movement.
Open Source Postgres became open source therefore open to contributions once it escaped the confinements of the Berkley laboratory.It was this property that in the end let it evolve into a melting pot of the newest and greatest ideas; and this move paid dividends very soon.After two students introduced a variant of SQL to the engine in place of the propriety Postquel query language (QUEL was Ingres's) the team got distracted with other activities.At this point the open source contributors stepped in by not just contributing peripherally but also improving the very core.Since then the community has been growing.
Commercial Adaptations "PostgreSQL has long been an attractive starting point for building commercial database systems, given its permissive open source license, its robust codebase, its flexibility, and breadth of functionality".
As the result, many products are based on, or are even forks of it.
Greenplum was the first effort to offer a shared-nothing parallel, scale-out version of PostgreSQL.
EnterpriseDB was founded in 2004 as an open-source-based business, selling PostgreSQL in both a vanilla and enhanced edition with related services for enterprise customers.
ParAccel enhanced the Postgres optimizer with new heuristics for queries with many joins. In 2011, Amazon invested in ParAccel, and in 2012 announced AWS Redshift, a hosted data warehouse as a service in the public cloud based on ParAccel technology.
CitusDB was founded in 2010 to offer a shared-nothing parallel implementation of PostgreSQL. While it started as a fork of PostgreSQL, as of 2016 CitusDB is implemented via public PostgreSQL extension APIs and can be installed into a vanilla PostgreSQL installation.
Lessons learned The lessons taught by Postgres's course and history is that in order for a product to assume success, it should first of all be extensible :
"With extensibility as an architectural core, it is possible to be creative and stop worrying so much about discipline: you can try many extensions and let the strong succeed. Done well, the “second system” is not doomed; it benefits from the confidence, pet projects, and ambitions developed during the first system."
Secondly, to understand that despite "one size" does not fit all problems, a general purpose database architecture as any other general purpose technology, can however fit most.
And lastly, the success that comes with open sourcing everything in “do something important and set it free"..
The latest version of Git, the distributed version control system, has been released with around 500 changes, including two new sub-commands that are designed to provide an experimental alternative to [ ... ]