As much as the NoSQL pundits would like to make us think otherwise, learning about relational database technologies is still, and hopefully will still be, a staple of computer science education in the years to come. There are quite a few authors considered as references in the field, but without any doubt, the subject of this month’s Library article is by far the most prolific of them all.
Of course, each one of the most successful RDBMs in the industry has had its share of massive books written for them. In my personal library stands out a copy of the massive, 1400-page (!) “Professional SQL Server 2000 Programming” by Robert Vieira, a book that was at the cornerstone of two activities of mine 20 years ago: writing .NET applications using SQL Server 2000 as a backend data store, and teaching SQL Server to unsuspecting victims. I plead guilty, your honor.
When it comes to books about the grand theory of relational database systems, other titles stand out. Come to mind “Fundamental of Database Systems” by Ramez Elmasri and Shamkant Navathe (7th Edition in 2015); “Database System Concepts”, by Abraham Silberschatz (co-author of the “dinosaur book” on operating systems), Henry F. Korth and S. Sudarshan (7th edition in 2020); “Database Systems: The Complete Book” by Hector Garcia-Molina, Jeff Ullman (co-author of the “Dragon Book” on compiler design), and Jennifer Widom (3rd edition in 2008); “Database Systems: A Practical Approach to Design, Implementation, and Management” by Thomas Connolly and Carolyn Begg (6th edition in 2014); and finally “An Introduction to Database Systems”, by C. J. Date, whose 1st edition was published in 1975 and has had its 8th and so far, last edition in 2003.
Christopher J. Date, born in England in 1941 and whose name is usually spelled C. J. Date on the cover of his books, is by far the most prolific author in relational database theory. His work spans 50 years, almost as long as the field of RDBMs itself, and has produced an incredible number of books widely considered to be de facto standards in the field. I write this phrase in the present tense for a reason: he is still active and still publishing new work as this article hits the web.
C. J. Date and Edgar F. Codd worked together, first at IBM (in the case of Date, from 1967 to 1983), and later in other ventures, expanding and commercializing the knowledge pioneered by the latter in his 1970 paper, arguably one of the most influential papers in the history of computing. For example, they participated together at the now legendary ACM SIGFIDET 1974 conference where Codd and Charles Bachman would debate the relative merits of the relational versus the network model of databases.
As much as he has made a career in the field of databases and SQL, C. J. Date is not very fond of the status quo of RDBMs and query languages. He made that point rather clear in an interview in 2014:
There’s more, a lot more. SQL isn’t just user hostile, it involves some very serious departures from relational theory. I don’t think this is the place to get into specifics-I’ve written about those problems at great length elsewhere (as indeed other people have too, including in particular my friend and colleague Hugh Darwen).
Let us not forget that until IBM released DB/2 with support for SQL in the mid-1980s, there was not a clear winner in the query languages war; the weight of IBM tilted the balance once and for all in favor of what C. J. Date, still today, considers the lesser alternative:
Suffice it to say that those departures are so serious that I honestly believe SQL has no real right to be called relational at all. As a consequence, SQL DBMSs have no real right to be called relational at all, either. The truth is, there never has been a mainstream DBMS product that’s truly relational.
Neither is he fond of XQuery, another one of his creations:
To sum up: I’m obviously no fan of SQL, but no, I don’t think XQuery is any better. In fact, I think it suffers from some of the same problems that SQL does. At least SQL, with all of its faults, can be used-with a lot of discipline, like avoiding duplicates and nulls-almost as if it were relational; but the same clearly can’t be said of XQuery.
C. J. Date’s magnum opus, “An Introduction to Database Systems,” has been considered a hallmark and a default choice for computer science curricula for over a quarter of a century. The story of the first edition of the book (written in 1972 and only published in 1975) is better told in his own words in his “Oral History of C. J. Date” published by the Computer History Museum in 2007.
I wrote it very quickly, but I was in IBM. And for someone in IBM in those days to publish something, it had to go through the clearance procedures in IBM. And so I submitted it to the clearance procedure, and of course the problem with the book from an IBM point of view was that it was not coming out saying that IMS was the greatest product ever invented. So everybody who reviewed it in the clearance procedure did two things. First, they found something I had to change. And second, they found somebody else who had to review it. This process was clearly going to go on for a long time!
If sales numbers are something to believe, this book has been a bestseller for decades. In 2003, when the 8th edition was published, almost a million copies were sold. Few books in computer science have reached similar figures.
Well, the responses were very good, and the book was fairly quickly adopted by colleges and universities and became…well, I won’t say a standard, but a very widely used book in universities. The book appeared in February of 1975. In May of 1975, there was a National Computer Conference in Anaheim. And I gave a tutorial there on this new relational stuff, and associated with the conference there was a sort of trade show. Addison-Wesley, the publisher of the IBM series of books, had a stand there. And I went by afterwards to see what was happening. And the guy there said, “I’m sick of hearing your name.” He said, <laughing> “I just sold 1,800 of your books in the last hour and a half.” And then, of course, I got reviews, and the reviews were, though I say it myself, they were good. They said it was clear and it was balanced, believe it or not, and understandable.
Even if Edgar F. Codd was the clear instigator of the relational model, C. J. Date was the early and relentless DevRel spokesperson who was responsible for its spread worldwide. The number of written works he published during the past half a century is hard to fathom. His Goodreads author page contains an astonishing list of 30 results, all related to database theory in one way or another: relational theory, the SQL standard, database dictionaries, and so much more. That list misses, however, his last opus, “On Cantor and the Transfinite”, published earlier this year, as well as his “SQL and Relational Theory Master Class” published by O’Reilly in 2010.
For those interested in the subject of databases, here go a few other interesting sources for your reading pleasure. First, the “Readings in Database Systems, 5th Edition”, edited by Peter Bailis, Joseph M. Hellerstein, and Michael Stonebraker. Then, “Database Debunkings”, a controversial blog maintained by Fabian Pascal, and featuring frequent contributions by C. J. Date. Also, “Mining of Massive Datasets”, by Jure Leskovec, Anand Rajaraman, and Jeffrey D. Ullman. Finally, a more than complete bibliography on the subject of databases, and an invaluable source of information during the preparation of this issue of De Programmatica Ipsum: the DBLP, maintained by a team of researchers at Schloss Dagstuhl.
Cover image from eBay.