Appendix B. Relational Database Technology: A Crash Course
Table of Contents
You can use .NET to access data from any data source: relational databases, object databases, flat files, and text files. The vast majority of web applications, however, will access data from a relational database such as SQL Server, Oracle, or MySQL. Though we could certainly write an entire book on relational databases and another on SQL, the essentials of these technologies are not hard to understand.
All of the examples in this appendix assume you are working with either SQL Server or SQL Server Express. Users of other relational databases will find that the lessons learned here transfer well to their environment, but be careful with applications such as Oracle, MySQL, and Access that use a different variation of SQL.
A database is a repository of data. A relational database organizes your data into tables that are connected to one another by the data they contain. For example, one table might contain a customer's information and a second table might contain information about orders, and each order is owned by an individual customer.
Similarly, you might have one table of cars and a second table of car parts. Each part can be in one or more cars, and each car is made up of many parts. Or you might have a table for bugs and a table for developers. Each bug is owned by one developer, and each developer has a list of bugs he owns.
The principal division of a database is into tables. Every table in a relational
database is organized into rows, where each row
represents a single record. The rows are organized into
columns. All the rows in a table have the same
column structure. For example, a
Customer table might have columns for the
customer ID (the unique ID of the
customer placing the order), the name of the customer, the email
address of the customer, and so forth.
It is common to make an analogy between tables and classes
and between rows and objects. The
Customer table, for example, tells you a great
number of things about a customer, just as a
Customer class does. Each row in the
Customer table describes a particular customer,
much as an instance of the
This analogy is compelling, but limited. Tables, like classes, typically describe one logical entity and all of what you know about that entity. The mapping isn't reflexive, though—classes can contain properties that actually map across several different tables. This is similar to the notion of relationships between tables to ensure that consistent data in a database is not reflected back as relationships between classes.
One of the challenges facing an object-oriented programmer is overcoming these and other design differences between the object model on the one hand and the database model on the other. Typically, the interface between the backend relational database and the objects in the application is managed by creating a database interface layer of objects that negotiate between the creation of objects and the storage of information in the database tables, all created by hand. The past few years, however, have seen a number of products known as object-relational mappers, which can scan the structure of a database and, with a little help from the developer, produce a set of classes which represent the entities and relationships and make that transition between your code and your database that much easier. Language Integrated Query (LINQ), covered in Chapter 10, Presenting LINQ, actually includes two object-relational mappers:
LINQ to SQL looks after the simpler case where your classes generally have a one-to-one relationship with tables in the database.
LINQ to Entities (available with VS2008 Service Pack 1) deals with the more enterprise-level case where your classes generally map to information in more than one table in your database.
This distinction is not a hard and fast rule by any means, but it does make the general distinction between the two that Microsoft is using.
Several other mappers are also available, including:
To understand the issues in table design, consider a database for recording orders (such as the AdventureWorksLT database used in this book). You need to know who placed each order, and it would be useful to know the email address, phone number, and other identifying information about each person as well.
You can imagine a form in which you display details about a given order, and in that detail page you offer the email address and phone number of the customer so the salesperson working on the order can contact that customer in case of a delay or stock issue.
You could store the identifying information with each order in
Orders table (named
SalesOrderHeader in the database, for reasons
we'll mention in a minute), but that would be inefficient. If John
Doe placed 50 orders, you'd rather not repeat John Doe's email
address and phone number in 50 records. It's also a data
maintenance nightmare. If John Doe changes his email address and
phone number, you'd have to make the change in 50 places.
Instead, the customer details are kept in a second table, called
Customer, in which each row represents
a single customer. In the
table, there will be a column for the
CustomerID. Each customer will have a unique ID,
and that field will be marked as the primary key for the
Customer table. A primary key is the column or
combination of columns that uniquely identifies a record in a given
Orders table will use the
PersonID column as a foreign key. A foreign key is a column
(or combination of columns) that is a primary (or otherwise unique)
key from a different table. The
table uses the
CustomerID, which is
the primary key in
identify which person placed the order. If you need to determine
the email address for that person, you can use the
CustomerID to look up the
Customer record in the
Customer table and that will give you all the
detailed information about that person.
By "factoring out" the details of the person's address into a
Customer table, you reduce the
redundant information in each
record. This process of taking out redundant information from your
tables is called normalization.
The same process explains why an order is actually stored in two
one. An order can be split into individual requests for products
and so the
stores the details of the order as a whole—when it is due,
when it was shipped, how much it cost in total, and so
on—whereas rows in the
SalesOrderDetail table contain details for each
separate product in the order.
Normalization makes your use of the database
more efficient, and it reduces the likelihood of data corruption.
If you kept the customer's email address in the
Customer table and in the
SalesOrderHeader table, you would run the risk
that a change in one table might not be reflected in the other.
Thus, if you changed the person's email address in the
Customer table, that change might not be reflected
in every row in the
table (or it would be a lot of work to make sure that it was
reflected). By keeping only the
SalesOrderHeader, you are free to change the email
address or other personal information in
Customer, and the change will automatically be
reflected for each order.
Just as Visual Basic and C# programmers want the compiler to
catch bugs at compile time rather than at runtime, database
programmers want the database to help them avoid data corruption. A
compiler helps avoid bugs by enforcing the rules of the language.
For example, in C# you can't use a variable you've not defined.
SQL Server and other modern relational databases help you avoid bugs by
enforcing constraints that you create. For example, the
Customer table marks the
CustomerID field as a primary key. This creates a
primary key constraint in the database, which ensures that each
CustomerID is unique. If you were to
enter a person named Dan Maharry with a
PersonID of 123, and then you were to try to add
Jesse Liberty with a
PersonID of 123,
the database would reject the second record because of the primary
key constraint. You would need to give one of these people a
different, and unique,
Relational databases use Declarative
Referential Integrity (DRI) to establish constraints on
the relationships among the various tables. For example, you might
declare a constraint on the
SalesOrderHeader table that dictates that no order
may have a
CustomerID unless that
CustomerID represents a valid record
Customer table. This helps you
avoid two types of mistakes. First, you cannot enter a record with
CustomerID. Second, you
cannot delete a
Customer record if
CustomerID is used in any order.
The integrity of your data and the relationships among records is
The language of choice for querying and manipulating databases is Structured Query Language, often referred to as SQL. SQL is often pronounced "sequel." SQL is a declarative language, as opposed to a procedural language, and it can take awhile to get used to working with a declarative language if you are used to languages such as Visual Basic and C#.
Most programmers tend to think in terms of a sequence of steps:
"Find me all the orders, then get the customer's ID, then use that
ID to look up that customer's records in
Customer, then get me the email address." In a
declarative language, such as SQL, you declare the entire query and
the query engine returns a set of results. You are not thinking
about a set of steps; rather, you are thinking about designing
and "shaping" a set of data. Your goal is to make a single
declaration that will return the right records. You do that by
creating temporary "wide" tables that include all the fields you
need and then filtering for only those records you want: "Widen the
SalesOrderHeader table with the
Customer table, joining the two on the
CustomerID, then filter for only those
that meet my criteria."
Select FirstName, LastName, CompanyName from Customer where SalesPerson = 'Janet'
For example, suppose you'd like to know about all the orders by all the customers served by Janet after January 1, 1997. You might create this query:
Select c.FirstName, c.LastName, c.CompanyName, o.OrderDate, o.TotalDue from Customers c inner join SalesOrderHeader o on o.customerID = c.customerID where c.SalesPerson = 'Janet' and o.OrderDate > '1/1/97'
At first glance, you appear to be selecting
orderDate from the
Customer table, but that is not possible because
Customer table does not have an
orderDate. You must take care to
identify the table that each field comes from by prefixing each
field with an "alias" defined in the query. The key phrase is:
It is as though the
creates a temporary table that is the width of the
Customer table and the
SalesOrderHeader table joined together. The
on keyword dictates how the tables are
joined. In this case, the tables are joined on the
CustomerID column in
Customer and in
Each record in
(represented by the alias
c) is joined
to the appropriate record in
(represented by the alias
o) when the
CustomerID fields match in both
When you join two tables you can say "get every record that exists in one table but not necessarily the other" (this is called an outer join), or you can say, as we've done here, "get only those records that exist in both tables" (called an inner join).
The inner join shown in the preceding code snippet says: get
only the records in
Orders that match
the records in
Customer by having the
same value in the
on o.customerID = c.customerID).
where clause further constrains
the search to those records where the
SalesPerson field in
Customer is an exact match for the string
Janet and where the
OrderDate in the
Orders table is greater than January 1, 1997:
where c.SalesPerson = 'Janet' and o.orderDate > '1/1/97'
Orders is the only
table with an
orderDate column, there
is no ambiguity if we use just the column name:
where c.SalesPerson = 'Janet' and orderDate > '1/1/97'
It's often easier to read, though, if all columns are given a
prefix rather than you having to figure out which table each column
is part of. SQL is able to translate the string "1/1/97"
DateTime representing January
You can use SQL to search for and retrieve data and to create,
update, and delete tables and generally manage and manipulate the
content and the structure of the database. For example, you can
FirstName of a specific
Update Customer set FirstName = 'Jesse' where CustomerId = '124'
You can add a new row to the
INSERT INTO [Customer] ([FirstName], [LastName], [SalesPerson]) VALUES ('Dan', 'Maharry', 'Janet')
And you can delete a row from the table as well:
DELETE from Customer where FirstName='Dan' and LastName='Maharry'
In this book, the database server of choice is Microsoft's SQL Server 2005, but as we mentioned earlier, it is not the only kid on the block and there are a number of different free versions of the major databases to tide you over while you learn. They all have a lot of good documentation installed with them as well, making them well worth a read.
SQL Server 2005 Express Edition (http://www.microsoft.com/express/sql/)
Oracle 10g Express Edition (http://www.oracle.com/technology/software/products/database/xe/htdocs/102xewinsoft.html)
MySQL Community Server (http://dev.mysql.com/downloads/mysql/5.0.html#win32)
Likewise, there are many good books and websites to teach you a lot more about database design and SQL queries than you'll ever need (probably). For example:
Head First SQL by Lynn Beighley (O'Reilly), a cracking beginner's guide to SQL
SQL Visual Quickstart Guide by Chris Fehily (Peachpit), a slimmer alternative to Head First SQL
SQL Queries for Mere Mortals by John L. Viescas and Michael J. Hernandez (Addison-Wesley), the book to look at once you've got the basics of SQL in your head
Professional SQL Server 2005 Programming by Rob Vieira (Wrox), the book to take you outside of just writing SQL and into thinking about SQL Server as a whole programming platform