SQL JOIN





In all SQL examples so far we've selected data from a single table. In the previous chapter we learned how the concepts of Primary Key and Foreign Key, and how database table relate to one another. Using that knowledge we can move forward and learn how to select data from more than one table in one SQL statement.

The SQL JOIN clause is used to retrieve data from 2 or more tables joined by common fields. The most common scenario is a primary key from one of the tables matches a foreign key in second table. We will use the 2 related tables Product and Manufacturer from the previous chapter, to illustrate how to use JOIN. Consider the SQL JOIN statement below:

SELECT Manufacturer, ManufacturerWebsite, ManufacturerEmail, AVG(Price) AS AvgPrice FROM Manufacturer JOIN Product ON Manufacturer.ManufacturerID = Product.ManufacturerID GROUP BY Manufacturer, ManufacturerWebsite, ManufacturerEmail

The first obvious thing about this SQL statement is that it contains columns from 2 different tables in the SELECT column list. Then the FROM clause is followed by a JOIN clause. The JOIN clause has 2 parts, the first one stating the tables we are joining:

Manufacturer JOIN Product

And the second part, which specifies which columns we are joining on:

ON Manufacturer.ManufacturerID = Product.ManufacturerID

Because the Price column is a parameter for the AVG function in our SQL statement, we need to use GROUP BY clause for the rest of the columns in the SELECT list. As you might already have guessed our SQL statement selects a list of all manufacturers and the average price of their product. The result will look like this:

ManufacturerManufacturerWebsiteManufacturerEmailAvgPrice
Dellhttp://www.dell.comsupport@dell.com$682.33
Toshibahttp://www.toshiba.comsupport@toshiba.com$741.50

You can specify the JOIN condition in the WHERE clause instead of the FROM clause, without using the JOIN keyword like this:

SELECT Manufacturer, ManufacturerWebsite, ManufacturerEmail, AVG(Price) AS AvgPrice FROM Manufacturer, Product WHERE Manufacturer.ManufacturerID = Product.ManufacturerID GROUP BY Manufacturer, ManufacturerWebsite, ManufacturerEmail

It's a better programming practice to specify your JOIN conditions in the FROM clause. When joining tables you'll have to make sure that there is no ambiguity in the column names. In our example both Manufacturer and Product tables have a column named ManufacturerID, that's why we prefixed this columns name with the respective table name concatenated with dot. Another thing worth mentioning in our example is the following part of the SQL statement:

AVG(Price) AS AvgPrice

Because the column produced by the AVG function doesn't have its own name we made up a name for it - AvgPrice. If we put this in SQL terms - we created an alias for the new column.

There are 2 main types of SQL JOINS - INNER JOINS and OUTER JOINS. In our example we didn't specify what type was the JOIN, and by doing that we used INNER JOIN by default. The INNER JOIN and JOIN clauses are interchangeable in general (Keep in mind that different RDBMS have different syntax for their JOIN clause).

The INNER JOIN clause will retrieve all rows from both tables as long as there is a match between the columns we are joining on. If we add a new manufacturer to our Manufacturer table, but we don't add any products for it in the Product table, and we run our JOIN statement from above, the result will be the same as it was before adding the new manufacturer. This simply happens because we don't have a match for this new manufacturer in the Product table, and because we are using INNER JOIN, which returns only the matching rows. The final result is that this manufacturer with products doesn't appear in the retrieved data.

Wait you say, what if I want to get the list of all manufacturers no matter if they have any products listed in the Product table? How can I do that? The answer is - use OUTER JOIN.

OUTER JOIN clause returns all rows from at least one of the joined tables, granted that these rows meet the search conditions specified in the WHERE and HAVING clause (if any).

In order to get all manufacturers and their average product price, without worrying that some of the manufacturers do not have any products listed yet, we will use the following OUTER JOIN SQL statement:

SELECT Manufacturer, ManufacturerWebsite, ManufacturerEmail, AVG(Price) AS AvgPrice FROM Manufacturer LEFT OUTER JOIN Product ON Manufacturer.ManufacturerID = Product.ManufacturerID GROUP BY Manufacturer, ManufacturerWebsite, ManufacturerEmail

The only difference in our new statement is that we added the keywords LEFT OUTER in front of the JOIN keyword. The SQL OUTER JOIN has 2 sub-types called LEFT OUTER JOIN (or simply LEFT JOIN) and RIGHT OUTER JOIN (or simply RIGHT JOIN). When we use LEFT OUTER JOIN clause we indicate that we want to get all rows from the left table listed in our FROM clause (we will also called it the first table), even if they don't have a match in the right (second) table.

What values will be returned for the columns selected from the second table, which do not have a match, you may ask? If we relate this question to our example it will sound like this: What average product price will our SQL query return for all manufacturers, which don't have any products in the Product table? The answer is simple - NULL.

The result of our LEFT OUTER JOIN query will be the following if we added Sony to the Manufacturer table, but we didn't add any Sony products to the Product table:

ManufacturerManufacturerWebsiteManufacturerEmailAvgPrice
Dellhttp://www.dell.comsupport@dell.com$682.33
Toshibahttp://www.toshiba.comsupport@toshiba.com$741.50
Sonyhttp://www.sony.comsupport@sony.comNULL

The RIGHT OUTER JOIN or simply RIGHT JOIN does exactly the opposite the LEFT JOIN does. The RIGHT OUTER JOIN gets all rows from the right (second) table listed in our FROM clause, even if they don't have a match in the left (first) table and returns NULL values for the columns from the left table we don't have match for.

Finally a table can be joined to itself, and to accomplish that you need to give the table 2 different aliases in the FROM clause.