If you always join tables you put in a query, then you aren’t taking advantage of one of the fundamental features of SQL. Not only that, Peter writes, but you’re also missing an opportunity to see how SQL really works.
WHEN we first started using Access as our company’s end-user reporting tool, there were some unfortunate incidents. In one of those incidents, our president put two tables, each with about 100,000 records, in a query and clicked the Run button. His computer rumbled for about five minutes, sighed deeply, and locked up. So he did it again. After the fourth or fifth time, he decided that this repeated failure wasn’t accidental and called me to fix it.
What was happening, of course, was that he had failed to join any fields in the first table to any of the fields in the second table. When you put two tables in the same query without joining them, SQL’s default behavior is to give you a recordset that contains every combination of every record in both tables. This type of result set is called a Cartesian product. Because we had more than 100,000 records in each of the tables in this example, the big sigh the computer was making occurred when it discovered there wasn’t enough memory to hold the 10 billion records he was generating. In mathematical circles, this is known as the “law of large numbers:” if you multiply one large number by another large number you get a humongous number.
I explained what was happening and my boss’ immediate reaction was the one that most users have: he was the victim of a bug. I don’t think I ever really convinced him that it wasn’t a bug and that he had discovered the fundamental behavior that makes SQL work.
How SQL works
Back in the bad old days, there was no special SQL syntax for joining two tables together. Instead of using the Join keyword and coding this:
SELECT * FROM Table1 INNER JOIN Table2 ON Table1.Field1 = Table2.Field2
you simply added a Where clause to the SQL statement:
SELECT * FROM Table1 , Table2 WHERE Table1.Field1 = Table2.Field2;
It’s important to see this in light of SQL’s default behavior. Simply putting two tables in a query like this:
SELECT * FROM Table1, Table2;
generates every possible combination of the rows in the two tables. A Where clause specifies which rows, out of all of those combinations, you’re interested in. In the sample case, it’s just the ones where Table1.Field1 equals Table2.Field2.
The Join keyword was added primarily to make it easier for query optimizers to determine the best strategy for retrieving the records—and also to introduce a standardized way of specifying outer joins (which every SQL implementation seemed to do differently). While the query optimizer may use the Join clause to work out some arcane way of retrieving the data, for the result to be valid it must match the result you would get if you generated every possible combination of rows and then went through those rows winnowing out the ones that don’t match the various Where clauses.
Useful results and No Joins Can Be A Feature
Sometimes you have to write your SQL code using Where clauses to specify how the tables in the query are related. If you’re joining two fields of incompatible types, for instance, you have no choice. If Field1 is a string and Field2 is an integer you can’t just drag Field1 over to Field2 in the Query Design window (well, you can, but the resulting query won’t run). What you have to do, after adding both tables to the query, is pull Field1 down to the query grid and give it a criteria of CStr([Field2]). The resulting SQL would look like this:
SELECT Table1.Field1, * FROM Table1, Table2Num WHERE Table1.Field1=CStr([Field2]);
This harks back to the original SQL method for joining tables. There are even occasions when you don’t want to join the tables in a query because you want to exploit the default behavior. This may occur when you’re loading a table with data. For instance, Joe Fallon of Nestle and I have been trying to find a simple way to generate a round robin schedule for a hockey tournament. The first step of our solution is to load the tblGames table with all the possible games that could result from the teams in the tblTeams table playing each other. We do this by putting two unjoined copies of the Teams table in the same MakeTable query:
SELECT tblTeam.TeamName, tblTeam_1.TeamName INTO tblCombinations FROM tblTeam, tblTeam AS tblTeam_1
This query creates a table of all the possible team combinations, including some impossible ones where a team ends up playing itself.
Here’s another example: at my company we have to keep track of the road graders we’re building. We build a grader in 65 separate steps and when a step is completed, we check off that step for that grader. I built the table for recording this information by using a query that takes a table of all the graders to be added and a table of every step, and then exploits SQL’s default behavior:
INSERT INTO tblStatus(Grader,Step) SELECT * FROM tblNewGraders, tblStepList
Once the table is built, recording progress is just a matter of the guys on the shop floor checking off each row as the grader finishes each step. It’s like I told my boss, when his computer kept blowing up trying to generate 10 billion records: “It isn’t a bug, it’s a feature.” I don’t think he appreciated the usefulness of Cartesian product joins, but he had a memorable introduction to the fundamental nature of SQL.