When it comes to relational theory, there is arguably no greater authority than C.J. Date. His newest book, SQL and Relational Theory presents a relational approach to SQL. As he says in the first chapter, "Of course, to treat such a topic adequately, I need to cover relational issues as well as issues of SQL per se." And so the book begins with a foundation that has comparatively little to do with SQL. With these passages, Date sets the scene for a thorough understanding of how SQL is meant to work, and anyone with a background in SQL will benefit from his insights. Even if you don't need the refresher course, it's always a pleasure to read C.J. Date's words.
The Relational Model Is Much Misunderstood
Professionals in any discipline need to know the foundations of their field. So if you're a database professional, you need to know the relational model, because the relational model is the foundation (or a huge part of the foundation, at any rate) of the database field in particular. Now, every course in database management, be it academic or commercial, does at least pay lip service to the idea of teaching the relational model—but most of that teaching seems to be done very badly, if results are anything to go by; certainly the model isn't well understood in the database community at large. Here are some possible reasons for this state of affairs:
- The model is taught in a vacuum. That is, for beginners at least, it's hard to see the relevance of the material, or it's hard to understand the problems it's meant to solve, or both.
- The instructors themselves don't fully understand or appreciate the significance of the material.
- Perhaps most likely in practice, the model as such isn't taught at all—the SQL language or some specific dialect of that language, such as the Oracle dialect, is taught instead.
So this book is aimed at database practitioners in general, and SQL practitioners in particular, who have had some exposure to the relational model but don't know as much about it as they ought to or would like to. It's definitely not meant for beginners; however, it isn't just a refresher course, either. To be more specific, I'm sure you know something about SQL; but—and I apologize for the possibly offensive tone here—if your knowledge of the relational model derives only from your knowledge of SQL, then I'm afraid you won't know the relational model as well as you should, and you'll probably know "some things that ain't so." I can't say it too strongly: SQL and the relational model aren't the same thing. Here by way of illustration are some relational issues that SQL isn't too clear on (to put it mildly):
- What databases, relations, and tuples really are
- The difference between relation values and relation variables
- The relevance of predicates and propositions
- The importance of attribute names
- The crucial role of integrity constraints
and so on (this isn't an exhaustive list). All of these issues, and many others, are addressed in this book.
I say again: If your knowledge of the relational model derives only from your knowledge of SQL, then you might know "some things that ain't so." One consequence of this state of affairs is that you might find, in reading this book, that you have to do some unlearning—and unlearning, unfortunately, is very hard to do.
Some Remarks on Terminology
You probably noticed right away, in that list of relational issues in the previous section, that I used the formal terms relation, tuple (usually pronounced to rhyme with couple), and attribute. SQL doesn't use these terms, however—it uses the more "user friendly" terms table, row, and column instead. And I'm generally sympathetic to the idea of using more user friendly terms, if they can help make the ideas more palatable. In the case at hand, however, it seems to me that, regrettably, they don't make the ideas more palatable; instead, they distort them, and in fact do the cause of genuine understanding a grave disservice. The truth is, a relation is not a table, a tuple is not a row, and an attribute is not a column. And while it might be acceptable to pretend otherwise in informal contexts—indeed, I often do exactly that myself—I would argue that it's acceptable only if we all understand that the more user friendly terms are just an approximation to the truth and fail overall to capture the essence of what's really going on. To put it another way: If you do understand the true state of affairs, then judicious use of the user friendly terms can be a good idea; but in order to learn and appreciate that true state of affairs in the first place, you really do need to come to grips with the more formal terms. In this book, therefore, I'll tend to use those more formal terms—at least when I'm talking about the relational model as opposed to SQL—and I'll give precise definitions for them at the relevant juncture. In SQL contexts, by contrast, I'll use SQL's own terms.
And another point on terminology: Having said that SQL tries to simplify one set of terms, I must add that it also does its best to complicate another. I refer to its use of the terms operator, function, procedure, routine, and method, all of which denote essentially the same thing (with, perhaps, very minor differences). In this book I'll use the term operator throughout.
Talking of SQL, incidentally, let me remind you that I use that term to mean the standard version of the language exclusively (International Organization for Standardization (ISO): Database Language SQL, Document ISO/IEC 9075:2003 (2003)), except in a few places where the context demands otherwise. However:
- Sometimes I use terminology that differs from that of the standard. For example, I use the term table expression in place of the standard term query expression, because (a) the value such expressions denote is a table, not a query, and (b) queries aren't the only context in which such expressions are used anyway. (As a matter of fact the standard does use the term table expression, but with a much more limited meaning; to be specific, it uses it to refer to what comes after the SELECT clause in a SELECT expression.)
- Following on from the previous point, I should add that not all table expressions are legal in SQL in all contexts where they might be expected to be. In particular, an explicit JOIN invocation, although it certainly does denote a table, can't appear as a "stand alone" table expression (i.e., at the outermost level of nesting), nor can it appear as the table expression in parentheses that constitutes a subquery (see Chapter 12). Please note that these remarks apply to many of the individual discussions in the body of the book; however, it would be very tedious to keep on repeating them, and I won't. (They're reflected in the BNF grammar in Chapter 12, however.)
- I ignore aspects of the standard that might be regarded as a trifle esoteric—especially if they aren't part of what the standard calls Core SQL or don't have much to do with relational processing as such. Examples here include the so called analytic or window (OLAP) functions; dynamic SQL; recursive queries; temporary tables; and details of user defined types.
- Partly for typographical reasons, I use a style for comments that differs from that of the standard. To be specific, I show comments as text strings in italics, bracketed by "/*" and "*/" delimiters.
Be aware, however, that all SQL products include features that aren't part of the standard per se. Row IDs provide a common example. My general advice regarding such features is: By all means use them if you want to—but not if they violate relational principles (after all, this book is supposed to be describing a relational approach to SQL). For example, row IDs are likely to violate what's called The Principle of Interchangeability (see Chapter 9); and if they do, then I certainly wouldn't use them. But, here and everywhere, the overriding rule is: You can do what you like, so long as you know what you're doing.
Principles, Not Products
It's worth taking a few moments to examine the question of why, as I claimed earlier, you as a database professional need to know the relational model. The reason is that the relational model isn't product specific; instead, it's concerned with principles. What do I mean by principles? Well, here's a dictionary definition (from Chambers Twentieth Century Dictionary):
principle: a source, root, origin: that which is fundamental: essential nature: theoretical basis: a fundamental truth on which others are founded or from which they spring
The point about principles is: They endure. By contrast, products and technologies (and the SQL language, come to that) change all the time—but principles don't. For example, suppose you know Oracle; in fact, suppose you're an expert on Oracle. But if Oracle is all you know, then your knowledge is not necessarily transferable to, say, a DB2 or SQL Server environment (it might even make it harder to make progress in that new environment). But if you know the underlying principles—in other words, if you know the relational model—then you have knowledge and skills that will be transferable: knowledge and skills that you'll be able to apply in every environment and will never be obsolete.
In this book, therefore, we'll be concerned with principles, not products, and foundations, not fads. But I realize you do have to make compromises and tradeoffs sometimes, in the real world. For one example, sometimes you might have good pragmatic reasons for not designing the database in the theoretically optimal way. For another, consider SQL once again. Although it's certainly possible to use SQL relationally (for the most part, at any rate), sometimes you'll find—because existing implementations are so far from perfect—that there are severe performance penalties for doing so...in which case you might be more or less forced into doing something not "truly relational" (like writing a query in some unnatural way to force the implementation to use an index). However, I believe very firmly that you should always make such compromises and tradeoffs from a position of conceptual strength. That is:
- You should understand what you're doing when you do decide to make such a compromise.
- You should know what the theoretically correct situation is, and you should have good reasons for departing from it.
- You should document those reasons, too, so that if they go away at some future time (for example, because a new release of the product you're using does a better job in some respect), then it might be possible to back off from the original compromise.
The following quote—which is due to Leonardo da Vinci (1452-1519) and is thus some 500 years old—sums up the situation admirably:
Those who are enamored of practice without theory are like a pilot who goes into a ship without rudder or compass and never has any certainty where he is going. Practice should always be based on a sound knowledge of theory.
(OK, I added the italics.)
Note: this is only a short excerpt from the first chapter of C.J. Date's book. He moves on to A Review of the Original Model, Model vs. Implementation, Properties of Relations, Base vs. Derived Relations, Relations vs. Relvars, Values vs. Variables, and winds up—as he does with each chapter—with concluding remarks and exercises.