John tackles some common problems that are faced by Access developers. Drawing on queries that keep cropping up in the Access newsgroups, John clears up some fundamental problems.
As I spend time in the Microsoft.Public.Access newsgroups (I like to specialize in the Queries section, but I dabble in the others from time to time), I find several “classic” problems popping up again and again. The most common type of application you’ll ever build will follow what I call the “order entry” database model. This sort of application has lots of nice, juicy many-to-many relationships and several sets of time-dependent data. If you think your local bowling club database isn’t “order entry,” think again. Don’t you “sell” lane assignments to your teams? Don’t teams have many members? And perhaps you’re a traveling league that bowls at many different locations. Aren’t the individual team member scores on a given night very similar to “order details?” You get the idea. In this month’s Access Answers column, I’ve pulled together the answers to three common query problems you’ll often need to solve in an “order entry” application.
I need to know which customers who’ve ordered from me in the past haven’t ordered anything in the past 30 or 60 days.
This is typical of one of the trickiest problems in this sort of database application: analyzing events that occured (or didn’t occur) over some period of time. In the bowling league example, you might similarly want to know which substitute players haven’t bowled in the last four sessions.
On first glance, you might be tempted to pop into the unmatched query wizard, then tweak the result to get the answer you want. For example, you can easily find out which customers haven’t ordered anything by using a wizard-generated query like this:
SELECT Customers.CustomerID, Customers.CompanyName FROM Customers LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID WHERE Orders.CustomerID Is Null;
This query asks Access to link customers and their orders on the matching key values. The LEFT JOIN specification tells Access to include all customers, regardless of whether they have any matching orders. This means you find at least one row in the query for each customer. When a customer has no orders, all of the columns that would normally be returned from a match in the orders table will contain the special NULL value. The specific test for NULL in the WHERE clause eliminates all but customers who have no orders.
If you want to find customers who haven’t ordered anything in the past 30 days, you might be tempted to try this:
SELECT Customers.CustomerID, Customers.CompanyName FROM Customers LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID WHERE (Orders.CustomerID Is Null) AND (Orders.OrderDate >= Date() 30);
The problem with this approach is that the query engine logically first performs the join, then applies the criteria in the WHERE clause. A particular result row from the join can’t be both NULL and have a date value at the same time.
One way to solve this problem is to first build a query to tell you all the customers who have placed an order in the past 30 days. This query might look something like this:
SELECT Orders.CustomerID FROM Orders WHERE Orders.OrderDate >= Date() - 30;
If you save the preceding query and call it “qryCstOrdLst30”, then you can solve the original problem like this:
SELECT Customers.CustomerID, Customers.CompanyName FROM Customers LEFT JOIN qryCstOrdLst30 ON Customers.CustomerID = qryCstOrdLst30.CustomerID WHERE qryCstOrdLst30.CustomerID IS NULL;
This looks a lot like the original query that found out which customers hadn’t ordered anything at all. This works because the saved query finds out which customers did order something in the past 30 days. The final answer uses the Unmatched Query Wizard trick to find out who didn’t order anything recently.
Another way to solve this problem is to embed the saved query as a subquery (see the April 1998 “Working SQL” column for a discussion of subqueries) and use a NOT IN clause. Although this would be a cleaner one-query solution, I don’t recommend it in an Access-only application because Microsoft Access is notorious for optimizing NOT IN very poorly. If you’re writing the query to execute against a back-end database like SQL Server, which handles subqueries well, here’s the answer:
SELECT Customers.CustomerID, Customers.CompanyName FROM Customers WHERE Customers.CustomerID NOT IN (SELECT Orders.CustomerID FROM Orders WHERE Orders.OrderDate >= Date() 30);
As you can see, creating the subquery is as simple as copying and pasting in the SQL from the saved query. In fact, you can use any query as a subquery in another query as long as the subquery returns only one column.
My problem is that I need to pull out customer and order, but I only want to see the information about the last order the customer placed.
Here’s another problem that you can solve using subqueries. You might find many occasions where you want to look at related data from two tables, but you’re only interested in the earliest or latest information from the many-side table. If all you want is the latest order date, you can do it with a Totals query like this:
SELECT Customers.CustomerID, Customers.CompanyName, Customers.Address, Customers.City, Customers.Region, Customers.PostalCode, Max(Orders.OrderDate) AS LastOrderDate FROM Customers INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID GROUP BY Customers.CustomerID, Customers.CompanyName, Customers.Address, Customers.City, Customers.Region, Customers.PostalCode;
As you can see, the GROUP BY clause can become rather cumbersome if you want several columns from the Customers table. Also, because this is a Totals query, you can’t update any of the columns returned.
For the previous problem, you saw how you could use a subquery in the Where clause to limit rows without having to perform a join. You can also use a subquery to calculate an output column as long as the subquery returns only a single value. Using a subquery, another way to state the preceding query is as follows:
SELECT Customers.CustomerID, Customers.CompanyName, Customers.Address, Customers.City, Customers.Region, Customers.PostalCode, (SELECT Max(OrderDate) FROM Orders WHERE Orders.CustomerID = Customers.CustomerID) AS LastOrder FROM Customers;
The clause to retrieve the highest order date is a bit trickier in this query, but the overall query is a lot simpler. Because this isn’t a Totals query, you might expect that at least the columns from the Customers table would be updateable. No such luck. Apparently, the query engine sees the aggregate (Max) function in the field list and decides to mark the entire query not updateable. I’ve had many discussions with the Microsoft Access development team about this, but to no avail. We have to remember that Access spoils us most of the time by making nearly all Select queries (even those with complex JOIN clauses) updateable.
So, I’ll up the ante just a bit by demanding information from the Order as well as from the Customer table table (Order ID or Employee ID, for instance). This gets trickier. Some of you might be tempted to solve it like this:
SELECT Customers.CustomerID, Customers.CompanyName, Customers.Address, Customers.City, Customers.Region, Customers.PostalCode, Max(Orders.OrderDate) AS LastOrderDate, Last(Orders.OrderID) AS LastOrderID, Last(Orders.EmployeeID) AS LastEmployeeID FROM Customers INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID GROUP BY Customers.CustomerID, Customers.CompanyName, Customers.Address, Customers.City, Customers.Region, Customers.PostalCode;
The preceding query might actually return the correct answer if orders have been added strictly in date sequence, no rows have been deleted, and no data has been updated once entered in the database — which seems pretty unlikely. The problem is that “last” asks for the value from the last physical row in the FROM recordset. If rows have been deleted (thereby leaving space physically earlier in the file for new rows) or data has been updated to force Access to rewrite a row at a new location, the “last” physical row might not be the row that has the “last” date!
The correct way to solve this problem is to filter the Order rows to include only those orders that are the latest order for each customer. Sometimes the easiest way to solve a complex query problem is to break it down into individual steps. As you’ll see in the end, it’s easy to finally combine steps to get the answer in a single query. First, you need to know the “last” or “max” order date for each customer. You can get that with the following query:
SELECT Orders.CustomerID, Max(Orders.OrderDate) As LastDate FROM Orders GROUP BY Orders.CustomerID;
You can get all the information about the “last” order for each customer by joining the preceding query with the Orders table. Assuming you save the previous query as “qryLstOrdByCust,” you can get to the next step with the following query:
SELECT Orders.* FROM Orders INNER JOIN qryLstOrdByCust ON Orders.CustomerID = qryLstOrdByCust.CustomerID AND Orders.OrderDate = qryLstOrdByCust.LastDate;
So far, so good. Now, save that query as “qryLstOrdByCustInfo.” You can now link this with the Customers table to get information about customers and all the details about the last order they placed. The result might look like this:
SELECT Customers.CustomerID, Customers.CompanyName, Customers.Address, Customers.City, Customers.Region, Customers.PostalCode, qryLstOrdByCustInfo.OrderID, qryLstOrdByCustInfo.EmployeeID, qryLstOrdByCustInfo.OrderDate FROM Customers INNER JOIN qryLstOrdByCustInfo ON Customers.CustomerID = qryLstOrdByCustInfo.CustomerID;
Care to guess whether this query is updateable? If you guessed that it’s not, you’re correct. Again, Access gets the scent of a Totals query embedded in the first step and marks the whole thing non-updateable.
You can actually solve this problem without building the initial queries that the final query depended on. Let’s start with all these same columns retrieved from a join between the Customers and Orders tables. The SQL looks like this:
SELECT Customers.CustomerID, Customers.CompanyName, Customers.Address, Customers.City, Customers.Region, Customers.PostalCode, Orders.OrderID, Orders.EmployeeID, Orders.OrderDate FROM Customers INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID;
Remember, the objective is to get the information only from the last order for each customer. To do this, you can use a variation of qryLastOrderByCustomer as a subquery criteria in a WHERE clause. The idea is to limit the rows returned from the Orders table to only those rows whose OrderDate matches the highest order date for that customer. To do this, you need to link the subquery back to the CustomerID column being retrieved from the Customers table in the outer query. Here’s the SQL:
SELECT Customers.CustomerID, Customers.CompanyName, Customers.Address, Customers.City, Customers.Region, Customers.PostalCode, Orders.OrderID, Orders.EmployeeID, Orders.OrderDate FROM Customers INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID WHERE Orders.OrderDate = (SELECT Max(OrderDate) FROM Orders WHERE Orders.CustomerID = Customers.CustomerID);
Note that instead of grouping on CustomerID in the subquery, I added a WHERE clause to restrict each selection using the CustomerID in the current row in the outer query. Personally, I think the subquery stated this way is a bit ambiguous. When you say “Orders.CustomerID = Customers.CustomerID” in the subquery, to which copy of the Orders table are you applying this criterion — the Orders table in the subquery or the Orders table in the outer query? Fortunately, the query engine applies the criterion to the most proximate copy of the table. If you want to be absolutely unambiguous about which table you mean, write the SQL like this, using table aliasing:
SELECT Customers.CustomerID, Customers.CompanyName, Customers.Address, Customers.City, Customers.Region, Customers.PostalCode, Orders.OrderID, Orders.EmployeeID, Orders.OrderDate FROM Customers INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID WHERE Orders.OrderDate = (SELECT Max(O2.OrderDate) FROM Orders As O2 WHERE O2.CustomerID = Customers.CustomerID);
The good news is that this query is also updateable! The bad news is that using a subquery this way asks the query engine to solve the inner query for each row in the outer query — also known as a correlated subquery. A really smart query engine would figure out how to run the subquery once grouped by Customer, then perform a join behind the scenes to make it run quickly. Unfortunately, the JET 3.5 engine in Microsoft Access 97 still solves it the hard way. If you have as few as 500 customers and 2,000 orders and try to solve this query over a network, Access will take several seconds to find all the rows.
However, knowing how to solve the problem in different ways gives you a choice. If you need speed but don’t care if the result is updateable — perhaps for a report — then use a join with a totals query. If you must have updateability using this answer in a form, use the subquery.
Here’s my problem: I’ve got a table of daily transactions that I need to merge into the master order table. Some of the transactions are going to update records in the table, and some are going to create whole new records. Is there an easy way to do this?
An interesting problem — how to merge a new copy of data into an existing table, updating rows that match, and inserting new rows.
You could take advantage of the database replication feature in Access 95 or Access 97, but often your incoming data comes from an external (non-Access) source. Even if you’ve implemented replication, you might still be faced with the problem of getting the changes into the “master” copy of your table before you can replicate it.
Fortunately, Microsoft Access supports a Join clause in update queries, so creating the query to update fields is relatively simple. Let’s assume you have a ProductMaster table that lists all the current products and prices. Periodically, you receive a ProductUpdate table with changes to prices, as well as any new products, added to the catalog. The query to update existing rows (using the Products table from Northwind as a template) might look like this:
UPDATE ProductMaster INNER JOIN ProductUpdate ON ProductMaster.ProductID = ProductUpdate.ProductID SET ProductMaster.ProductName = [ProductUpdate].[ProductName], ProductMaster.SupplierID = [ProductUpdate].[SupplierID], ProductMaster.CategoryID = [ProductUpdate].[CategoryID], ProductMaster.QuantityPerUnit = [ProductUpdate].[QuantityPerUnit], ProductMaster.UnitPrice = [ProductUpdate].[UnitPrice], ProductMaster.Discontinued = [ProductUpdate].[Discontinued];
Just to give you a hint of how ugly this looks using ANSI-standard syntax, here’s how you’d update only the ProductName field if you couldn’t use a Join:
UPDATE ProductMaster SET ProductMaster.ProductName = (SELECT ProductName FROM ProductUpdate WHERE ProductUpdate.ProductID = ProductMaster.ProductID);
To do all the fields, you’d need a separate subquery for each field.
The preceding query solves only half the problem. If there are rows in ProductUpdate that aren’t loaded into your ProductMaster yet, you need to identify them and insert them with an Append query. You can build this by starting with the Unmatched Query Wizard. Choose ProductUpdate as the source table and ProductMaster as the table containing the related records. Link the tables on ProductID, and make sure to include all fields in the output. Your query should look something like this:
SELECT DISTINCTROW ProductUpdate.ProductID, ProductUpdate.ProductName, ProductUpdate.SupplierID, ProductUpdate.CategoryID, ProductUpdate.QuantityPerUnit, ProductUpdate.UnitPrice, ProductUpdate.ReorderLevel, ProductUpdate.Discontinued FROM ProductUpdate LEFT JOIN ProductMaster ON ProductUpdate.ProductID = ProductMaster.ProductID WHERE ProductMaster.ProductID Is Null;
Now, turn the preceding query into an Append query, and specify ProductMaster as the target table for the insert. The resulting SQL is:
INSERT INTO ProductMaster (ProductID, ProductName, SupplierID, CategoryID, QuantityPerUnit, UnitPrice, ReorderLevel, Discontinued, ProductID ) SELECT ProductUpdate.ProductID, ProductUpdate.ProductName, ProductUpdate.SupplierID, ProductUpdate.CategoryID, ProductUpdate.QuantityPerUnit, ProductUpdate.UnitPrice, ProductUpdate.ReorderLevel, ProductUpdate.Discontinued, ProductMaster.ProductID FROM ProductUpdate LEFT JOIN ProductMaster ON ProductUpdate.ProductID = ProductMaster.ProductID WHERE ProductMaster.ProductID Is Null;
Now you can run the Update query followed by this Insert query to merge your transaction table into the master table.
This two-step process works for all tables in Access, including ones that have an Autonumber (Counter) Primary Key. If your target table has an updateable Primary Key (unlike the Autonumber data type, which isn’t updateable), then you can accomplish both the update and insert in a single query. The trick is to modify the Update query to use an Outer Join from the update table to the master table and include the Primary Key in the SET clause. When there’s a match, Access updates the existing record. When Access finds a row in the update table that doesn’t exist at all in the master table, it “updates” the entire row into the table by inserting it! The SQL looks like this:
UPDATE ProductUpdate LEFT JOIN ProductMaster ON ProductUpdate.ProductID = ProductMaster.ProductID SET ProductMaster.ProductID = [ProductUpdate].[ProductID], ProductMaster.ProductName = [ProductUpdate].[ProductName], ProductMaster.SupplierID = [ProductUpdate].[SupplierID], ProductMaster.CategoryID = [ProductUpdate].[CategoryID], ProductMaster.QuantityPerUnit = [ProductUpdate].[QuantityPerUnit], ProductMaster.UnitPrice = [ProductUpdate].[UnitPrice], ProductMaster.Discontinued = [ProductUpdate].[Discontinued];
You can use this single-query technique in a table that has an Autonumber Primary Key as long as you omit the update to the key field (ProductID, in this case). Be aware, however, that Access will automatically generate the “next” available Primary Key value for any inserted rows — which probably won’t match the Primary Key values in the incoming update table.