The following is an excerpt from the newly released Programming ASP.NET 3.5 by Jesse Liberty, Dan Maharry, and Dan Hurwitz. According to them, ASP.NET 3.5 is possibly the fastest, most efficient, most reliable, and best-supported way to create interactive web applications today. Combined with the development tools available from Microsoft (both free and commercial), it is incredibly easy to create websites that look great and perform well.
Appendix B. Relational Database Technology: A Crash Course
Table of Contents
You can use .NET to access data from any data source: relational databases, object databases, flat files, and text files. The vast majority of web applications, however, will access data from a relational database such as SQL Server, Oracle, or MySQL. Though we could certainly write an entire book on relational databases and another on SQL, the essentials of these technologies are not hard to understand.
Tip
All of the examples in this appendix assume you are working with either SQL Server or SQL Server Express. Users of other relational databases will find that the lessons learned here transfer well to their environment, but be careful with applications such as Oracle, MySQL, and Access that use a different variation of SQL.
A database is a repository of data. A relational database organizes your data into tables that are connected to one another by the data they contain. For example, one table might contain a customer's information and a second table might contain information about orders, and each order is owned by an individual customer.
Similarly, you might have one table of cars and a second table of car parts. Each part can be in one or more cars, and each car is made up of many parts. Or you might have a table for bugs and a table for developers. Each bug is owned by one developer, and each developer has a list of bugs he owns.
The principal division of a database is into tables. Every table in a relational
database is organized into rows, where each row
represents a single record. The rows are organized into
columns. All the rows in a table have the same
column structure. For example, a Customer table might have columns for the
customer ID (the unique ID of the
customer placing the order), the name of the customer, the email
address of the customer, and so forth.
It is common to make an analogy between tables and classes
and between rows and objects. The Customer table, for example, tells you a great
number of things about a customer, just as a Customer class does. Each row in the Customer table describes a particular customer,
much as an instance of the Customer
would do.
This analogy is compelling, but limited. Tables, like classes, typically describe one logical entity and all of what you know about that entity. The mapping isn't reflexive, though—classes can contain properties that actually map across several different tables. This is similar to the notion of relationships between tables to ensure that consistent data in a database is not reflected back as relationships between classes.
One of the challenges facing an object-oriented programmer is overcoming these and other design differences between the object model on the one hand and the database model on the other. Typically, the interface between the backend relational database and the objects in the application is managed by creating a database interface layer of objects that negotiate between the creation of objects and the storage of information in the database tables, all created by hand. The past few years, however, have seen a number of products known as object-relational mappers, which can scan the structure of a database and, with a little help from the developer, produce a set of classes which represent the entities and relationships and make that transition between your code and your database that much easier. Language Integrated Query (LINQ), covered in Chapter 10, Presenting LINQ, actually includes two object-relational mappers:
-
LINQ to SQL looks after the simpler case where your classes generally have a one-to-one relationship with tables in the database.
-
LINQ to Entities (available with VS2008 Service Pack 1) deals with the more enterprise-level case where your classes generally map to information in more than one table in your database.
This distinction is not a hard and fast rule by any means, but it does make the general distinction between the two that Microsoft is using.
Several other mappers are also available, including:
-
NHibernate (http://www.nhibernate.org)
-
SubSonic (http://www.codeplex.com/subsonic)
To understand the issues in table design, consider a database for recording orders (such as the AdventureWorksLT database used in this book). You need to know who placed each order, and it would be useful to know the email address, phone number, and other identifying information about each person as well.
You can imagine a form in which you display details about a given order, and in that detail page you offer the email address and phone number of the customer so the salesperson working on the order can contact that customer in case of a delay or stock issue.
You could store the identifying information with each order in
an Orders table (named SalesOrderHeader in the database, for reasons
we'll mention in a minute), but that would be inefficient. If John
Doe placed 50 orders, you'd rather not repeat John Doe's email
address and phone number in 50 records. It's also a data
maintenance nightmare. If John Doe changes his email address and
phone number, you'd have to make the change in 50 places.
Instead, the customer details are kept in a second table, called
Customer, in which each row represents
a single customer. In the Customer
table, there will be a column for the CustomerID. Each customer will have a unique ID,
and that field will be marked as the primary key for the Customer table. A primary key is the column or
combination of columns that uniquely identifies a record in a given
table.
The Orders table will use the
PersonID column as a foreign key. A foreign key is a column
(or combination of columns) that is a primary (or otherwise unique)
key from a different table. The Orders
table uses the CustomerID, which is
the primary key in Customer, to
identify which person placed the order. If you need to determine
the email address for that person, you can use the CustomerID to look up the Customer record in the Customer table and that will give you all the
detailed information about that person.
By "factoring out" the details of the person's address into a
Customer table, you reduce the
redundant information in each Order
record. This process of taking out redundant information from your
tables is called normalization.
The same process explains why an order is actually stored in two
tables—SalesOrderDetail and
SalesOrderHeader—rather than
one. An order can be split into individual requests for products
and so the SalesOrderHeader table
stores the details of the order as a whole—when it is due,
when it was shipped, how much it cost in total, and so
on—whereas rows in the SalesOrderDetail table contain details for each
separate product in the order.
Normalization makes your use of the database
more efficient, and it reduces the likelihood of data corruption.
If you kept the customer's email address in the Customer table and in the SalesOrderHeader table, you would run the risk
that a change in one table might not be reflected in the other.
Thus, if you changed the person's email address in the Customer table, that change might not be reflected
in every row in the SalesOrderHeader
table (or it would be a lot of work to make sure that it was
reflected). By keeping only the CustomerID in SalesOrderHeader, you are free to change the email
address or other personal information in Customer, and the change will automatically be
reflected for each order.
Just as Visual Basic and C# programmers want the compiler to
catch bugs at compile time rather than at runtime, database
programmers want the database to help them avoid data corruption. A
compiler helps avoid bugs by enforcing the rules of the language.
For example, in C# you can't use a variable you've not defined.
SQL Server and other modern relational databases help you avoid bugs by
enforcing constraints that you create. For example, the
Customer table marks the CustomerID field as a primary key. This creates a
primary key constraint in the database, which ensures that each
CustomerID is unique. If you were to
enter a person named Dan Maharry with a PersonID of 123, and then you were to try to add
Jesse Liberty with a PersonID of 123,
the database would reject the second record because of the primary
key constraint. You would need to give one of these people a
different, and unique, CustomerID.
Relational databases use Declarative
Referential Integrity (DRI) to establish constraints on
the relationships among the various tables. For example, you might
declare a constraint on the SalesOrderHeader table that dictates that no order
may have a CustomerID unless that
CustomerID represents a valid record
in the Customer table. This helps you
avoid two types of mistakes. First, you cannot enter a record with
an invalid CustomerID. Second, you
cannot delete a Customer record if
that CustomerID is used in any order.
The integrity of your data and the relationships among records is
thus protected.
The language of choice for querying and manipulating databases is Structured Query Language, often referred to as SQL. SQL is often pronounced "sequel." SQL is a declarative language, as opposed to a procedural language, and it can take awhile to get used to working with a declarative language if you are used to languages such as Visual Basic and C#.
Most programmers tend to think in terms of a sequence of steps:
"Find me all the orders, then get the customer's ID, then use that
ID to look up that customer's records in Customer, then get me the email address." In a
declarative language, such as SQL, you declare the entire query and
the query engine returns a set of results. You are not thinking
about a set of steps; rather, you are thinking about designing
and "shaping" a set of data. Your goal is to make a single
declaration that will return the right records. You do that by
creating temporary "wide" tables that include all the fields you
need and then filtering for only those records you want: "Widen the
SalesOrderHeader table with the
Customer table, joining the two on the
CustomerID, then filter for only those
that meet my criteria."
The heart of SQL is the query. A query is a statement that returns a set of records from the database. Typically, queries are in this form:
Selectfromcolumn,column,columnwheretable=columnvalue
For example, you might like to see information about the customers served by Janet. To do so you would write:
Select FirstName, LastName, CompanyName from Customer where SalesPerson = 'Janet'
SQL is capable of much more powerful queries (see the upcoming "SQL Select Statement" sidebar for the full syntax as provided by Microsoft).
For example, suppose you'd like to know about all the orders by all the customers served by Janet after January 1, 1997. You might create this query:
Select c.FirstName, c.LastName, c.CompanyName, o.OrderDate, o.TotalDue from Customers c inner join SalesOrderHeader o on o.customerID = c.customerID where c.SalesPerson = 'Janet' and o.OrderDate > '1/1/97'
At first glance, you appear to be selecting orderDate from the Customer table, but that is not possible because
the Customer table does not have an
orderDate. You must take care to
identify the table that each field comes from by prefixing each
field with an "alias" defined in the query. The key phrase is:
inner join SalesOrderHeader o on o.customerID = c.customerID
It is as though the join phrase
creates a temporary table that is the width of the Customer table and the SalesOrderHeader table joined together. The
on keyword dictates how the tables are
joined. In this case, the tables are joined on the CustomerID column in Customer and in SalesOrderHeader.
Each record in Customer
(represented by the alias c) is joined
to the appropriate record in Orders
(represented by the alias o) when the
CustomerID fields match in both
records.
When you join two tables you can say "get every record that exists in one table but not necessarily the other" (this is called an outer join), or you can say, as we've done here, "get only those records that exist in both tables" (called an inner join).
The inner join shown in the preceding code snippet says: get
only the records in Orders that match
the records in Customer by having the
same value in the CustomerID field
(on o.customerID = c.customerID).
The where clause further constrains
the search to those records where the SalesPerson field in Customer is an exact match for the string
Janet and where the OrderDate in the Orders table is greater than January 1, 1997:
where c.SalesPerson = 'Janet' and o.orderDate > '1/1/97'
But because Orders is the only
table with an orderDate column, there
is no ambiguity if we use just the column name:
where c.SalesPerson = 'Janet' and orderDate > '1/1/97'
It's often easier to read, though, if all columns are given a
prefix rather than you having to figure out which table each column
is part of. SQL is able to translate the string "1/1/97"
into a DateTime representing January
1, 1997.
You can use SQL to search for and retrieve data and to create,
update, and delete tables and generally manage and manipulate the
content and the structure of the database. For example, you can
update the FirstName of a specific
company:
Update Customer set FirstName = 'Jesse' where CustomerId = '124'
You can add a new row to the Customer table:
INSERT INTO [Customer]
([FirstName], [LastName], [SalesPerson])
VALUES
('Dan', 'Maharry', 'Janet')
And you can delete a row from the table as well:
DELETE from Customer where FirstName='Dan' and LastName='Maharry'
In this book, the database server of choice is Microsoft's SQL Server 2005, but as we mentioned earlier, it is not the only kid on the block and there are a number of different free versions of the major databases to tide you over while you learn. They all have a lot of good documentation installed with them as well, making them well worth a read.
-
SQL Server 2005 Express Edition (http://www.microsoft.com/express/sql/)
-
Oracle 10g Express Edition (http://www.oracle.com/technology/software/products/database/xe/htdocs/102xewinsoft.html)
-
MySQL Community Server (http://dev.mysql.com/downloads/mysql/5.0.html#win32)
Likewise, there are many good books and websites to teach you a lot more about database design and SQL queries than you'll ever need (probably). For example:
-
Head First SQL by Lynn Beighley (O'Reilly), a cracking beginner's guide to SQL
-
SQL Visual Quickstart Guide by Chris Fehily (Peachpit), a slimmer alternative to Head First SQL
-
SQL Queries for Mere Mortals by John L. Viescas and Michael J. Hernandez (Addison-Wesley), the book to look at once you've got the basics of SQL in your head
-
Professional SQL Server 2005 Programming by Rob Vieira (Wrox), the book to take you outside of just writing SQL and into thinking about SQL Server as a whole programming platform
-
Expert Oracle Database Architecture: 9i and 10g Programming Techniques by Thomas Kyte (Apress), the equivalent to Vieira's book for Oracle users
Print
By 





This is the prime example where normalization fails. The SalesOrderHeader table should record the information that was current at the time of the order. The Orders table is supposed to hold an accurate record of the business transaction. This includes storing user data that was valid at the time of the order, not the one that is current six months later. In fact, that would mean tampering with data that you might want to use for legal disputes later on which is not a good idea. Auto updating the address of a user's past transaction is a plain silly idea (sorry :). Normalization doesn't buy you anything and only gets you in trouble.
>> Normalization doesn't buy you anything and only gets you in trouble.
On the contrary. The problem with the given example is that it is not normalised enough. A customer can have more than one address simultaneously - billing, shipping, legal - as well as the historical dimension you mention. Furthermore, more than one customer can share the same address.
So a proper solution would have Address as a separate table which is linked to Customer through an intersection table (to capture history). The Order would then merely have to link to the appropriate record in the CustomerAddress intersection table.
In my experience whats you into trouble is having to decide whether John Smith, Smith John and J Smith are three instances of the same customer.
Cheers, APC
This is another valid conclusion, although, when adding natural object behaviour to a schema you end up with a 9-way-JOIN that will not perform very well. But I am biased towards the web here.
>> you end up with a 9-way-JOIN that will not perform very
>> well. But I am biased towards the web here.
I suppose I am biased towards applications working correctly. To a certain extent this is derives from the nature of the application.
To get a different result depending on which load balancer you hit doesn't really matter with something like Google. But it altogether matters with say online banking. I would rather my internet bank takes 10 secs to tell me I have 6000UKP in my account than incorrectly tell me I have 12000UKP in 0.1secs. Your tastes may vary.
Of course, in an ideal world we would deliver systems which are both fast and correct. Unfortunately this requires a degree of co-operation between data modellers, developers and DBAs which is usually lacking.
Cheers, APC