Interspire Tutorials - http://www.interspire.com/content
Database Joins
http://www.interspire.com/content/articles/35/1/Database-Joins
By Chris Smith
Published on 03/8/2006
 
In part three of this series I showed you how to index your database. I also mentioned table joins and how to index them. In this tutorial we will take a further look at table joins and how we can use them effectively.

Introduction to Table Joins
A visitor reading the second part of this tutorial asked what table joins are and how you use them. Because of that, this months tutorial is about exactly that - so remember, if you have a comment or question, make sure you send it through at the end of the article!

Setting up multiple tables in a database is an easy way to keep related data together. We keep news separate from authors, blogs separate from news, etc. Not only is this more managable from our point of view (we can quickly see which table contains what information), but all of our data will be in better shape also.

If we didn't separate the data then we'd have to keep the author information with the news item and store the blog information separately - it's not practical and very prone to errors. If you change an authors email address in one place, then you might forget to change it somewhere else.

Keeping the data separate like this is called "database normalization" and is quite a complicated topic in itself, so we'll leave that alone for now.

Back to the topic at hand!

In our CMS example from last month we have an authors table that looks like this:

CREATE TABLE authors (
  authorid INT PRIMARY KEY,
  username VARCHAR(255),
  firstname VARCHAR(255),
  lastname VARCHAR(255)
);

... and another table for news which is slightly modified from the previous example:

CREATE TABLE newsitem (
  newsid INT PRIMARY KEY,
  newstitle VARCHAR(255),
  newscontent TEXT,
  authorid INT REFERENCES authors(authorid)
);

We need to create the authors table first because the "newsitem" table makes a reference to it and the database should check to make sure it's available so it can set up the relationship.

How do we get the authors name for a particular news item? We have to join the tables together. Note that both tables (in this example at least) have a column called 'authorid'. In the 'authors' table this is a primary key so we can uniquely identify each row.

In most cases, when you join tables together in a query you should be using a primary key - or at least a unique key - from one of the tables.

Foreign Keys
Getting off topic again, in database terminology this is called a "foreign key" and allows the database itself to do some integrity checks. What does that mean? Well, it means the database can check that you have a valid author for each news item.

I'll probably get in trouble for saying this but MySQL doesn't support foreign keys properly at all, so be very wary - you can get bad data into your database even when using foreign key constraints (no matter which MySQL "storage engine" you use). PostgreSQL supports foreign keys fully and will stop bad data from entering your system.

We need to add an author to the database first, just like this:

INSERT INTO authors(authorid, username, firstname, lastname) VALUES (1, 'Chris', 'Chris', 'Smith');

Now that we have an author, we can add a news item:

INSERT INTO newsitem(newsid, newstitle, newscontent, authorid) VALUES (1, 'Test', 'Testing News', 1);

If we try to add a newsitem for an author that doesn't exist:

INSERT INTO newsitem(newsid, newstitle, newscontent, authorid) VALUES (2, 'Test Missing Author', 'Testing News for Missing Author', 2);

... then you should get an error (this is from postgresql 8.1.1):

ERROR:  insert or update on table "newsitem" violates foreign key constraint "newsitem_authorid_fkey"
DETAIL:  Key (authorid)=(2) is not present in table "authors".

This stops us from adding bad data!

Join Syntax
Back to the topic again!

To get the author information for a particular news item we need to join the tables together. We can do this with a "INNER JOIN" (this joins the table on the left to the table on the right using the column we specify):

SELECT * FROM newsitem INNER JOIN authors ON newsitem.authorid=authors.authorid;

Depending on which database you use you will get slightly different formatting - the results will be the same though.

From MySQL:

mysql> SELECT * FROM newsitem INNER JOIN authors ON newsitem.authorid=authors.authorid;
+--------+-----------+--------------+----------+----------+----------+-----------+----------+
| newsid | newstitle | newscontent  | authorid | authorid | username | firstname | lastname |
+--------+-----------+--------------+----------+----------+----------+-----------+----------+
|      1 | Test      | Testing News |        1 |        1 | Chris    | Chris     | Smith    |
+--------+-----------+--------------+----------+----------+----------+-----------+----------+
1 row in set (0.00 sec)


From PostgreSQL:

testing=# SELECT * FROM newsitem INNER JOIN authors ON newsitem.authorid=authors.authorid;
 newsid | newstitle | newscontent  | authorid | authorid | username | firstname | lastname
--------+-----------+--------------+----------+----------+----------+-----------+----------
      1 | Test      | Testing News |        1 |        1 | Chris    | Chris     | Smith
(1 row)


So we can see the news content and the author information all in one go! If we couldn't do this, in our code (eg PHP or Perl), we'd have to get the authorid from the newsitem table and then get the author information separately - so double the work.

Using the INNER JOIN syntax we have to tell MySQL/PostgreSQL how to join the tables together. In this case, using the authorid:

ON newsitem.authorid=authors.authorid

Otherwise the database doesn't know how to put the two tables together (this happens whether the foreign key relationship is created or not).

On another side note, I prefer this syntax:

SELECT * FROM newsitem, authors WHERE newsitem.authorid=authors.authorid;

It produces exactly the same output, but in complicated queries where you join more than 2 tables together, the join order is very important. Joining tables together in the wrong order will make the query very slow (and this can be quite hard to track down). It's a personal preference but I don't have any problems using this syntax (unless I forget an index which we've ignored in this tutorial).


Table Aliases
Another thing we can do to make this easier is to use a table alias. This is a "shortcut" to make our queries easier to read.

SELECT * FROM newsitem n INNTER JOIN authors a ON n.authorid=a.authorid;

We use "n" instead of "newsitem" and "a" instead of "authors". In this example we don't see much difference but if we selectively pick out our fields:

SELECT n.newsid, n.newstitle, n.newscontent, a.username, a.firstname, a.lastname FROM newsitem n INNER JOIN authors a ON n.authorid=a.authorid;

We can quickly see which field comes from which table and this saves us typing the whole table name!

In this particular case there's no real benefit from using table aliases, but try joining 3-4 tables together and you'll see a lot of use for it.

There you have it! Table joins and table aliases all in one go!