Here’s a comprehensive look at how you can use consolidation queries in Access. And we’re not just talking about how to use Group By, either.
Microsoft Access’s Query/SQL environment, along with its support of functions and complex expressions, makes Access a powerful tool for consolidating data. Access’s ability to work with many different database formats also gives you the ability to explore data in many different data sources. As I discovered, there’s more to consolidating data than clicking the Totals button in the Query Design window, and I’m going to share that information with you in this article.
The demonstration database
I’ve included a sample database called Totals.mdb in the accompanying Download file. Inside that database is a table called TblSalesResults and all the queries and functions that make up this article. You can try out the queries in this exercise yourself or generate them using the shareware data mining tool Graf-FX that I developed. Table 1 shows a sample of the data that the queries will use. The data consists of the sales and budgeted results for three soft drink products in different regions of Australia.
Table 1. A sample of the data in the tblSalesResults table.
Product Name | SalesDate | State | Region | Sales | Budgets |
Orange | 14/01/95 | NSW | Goulburn | 220 | 240 |
Lemonade | 14/01/95 | NSW | Sydney | 940 | 1000 |
Orange | 14/01/95 | NSW | Sydney | 1000 | 960 |
Cola | 14/01/95 | NSW | Sydney | 4000 | 3600 |
Consolidating
I’ll begin with a quick vocabulary lesson to make sure that we’re all speaking the same language for the rest of this article. If I were to set up a consolidation query on the tblSalesResults table to show the sales results by product and region, I’d have the following SQL query, whose sample results are shown in Table 2 (if you want to see how this query looks in the Query Design window, open the demonstration database and look at query qTot_Listing2):
SELECT productName, region, Group By columns Sum(sales) AS totSales Aggregated values FROM TblSalesResults Table/Query source WHERE ((state="NSW") Where clause GROUP BY productName, region; Group By columns HAVING ((totSales > 1000) Having clause ORDER BY Sum(sales); Order By clause
Table 2. Sample output for a consolidation query.
ProductName | Region | totSales |
Cola | Goulburn | 12160 |
Cola | Sydney | 82200 |
Lemonade | Goulburn | 3700 |
Lemonade | Sydney | 17858 |
I break consolidation queries up into five different components:
- Group By columns — These are the columns that you want to consolidate on. Choosing more than one Group By column will create a row in your query, every combination of the data found in these columns.
- Aggregated values — If a column isn’t being grouped on, it must be subject to an aggregate function. These aggregate functions fall into two groups: those that require a number field (Sum, Avg, StDev, Var), and those that can also work on text data (Count, Min, Max, First, Last). For detailed information on these functions, see the Foundation Concepts
- Table/Query source — This is the table or query where the data is derived from.
- Where clause — This is the criteria that restrict the rows of data that are being analyzed.
- Order By — You can sort the results of the query by any output field or fields in the query. If you don’t select a column, the query will be sorted by the Group By fields.
- Having — You can select which consolidated records to display by using a Having clause to select records based on the results of the consolidation.
Exceptions to these rules include the use of expressions to combine fields and multiple table/query sources where the sources are joined on common fields. The remainder of this article discusses ways that you can achieve more with your consolidation queries.
Exploiting the Group By columns
Determining the level of consolidation is important. Date-related information is one area where it would be unusual to analyze results by simply consolidating on a date field. After all, one of the purposes of consolidation is to reduce the level of detail, and summarizing by the date of a transaction doesn’t eliminate much detail. So, you’ll usually want to look at weekly, monthly, or quarterly results, even though the actual information is stored with one or more entries per day.
As discussed in the article by Angela J.R. Jones, “Saving Time with Date/Time Functions” in the June 1998 issue of Smart Access, there are a number of functions that extract years, months, and so on from dates. However, if you need a custom date, the Format function works best. Using the Format function, this SQL produces the output you see in Table 3:
SELECT DISTINCTROW Format([SalesDate],"yyyy-mm") _ AS SalesMonth, Sum(sales) AS TotSales FROM tblSalesResults GROUP BY Format([SalesDate],"yyyy-mm");
Table 3. The results of a query that formats date data.
SalesMonth | TotSales |
1995-01 | 31560 |
1995-02 | 33340 |
1995-03 | 31584 |
1995-04 | 33358 |
By consolidating the dates into months, you can now start analyzing the data to look for possible patterns.
This example shows two important things about working with dates. To begin with, the years are shown first, followed by the months as numbers. This guarantees that the output will be sorted sequentially. Also, when you start building these functions, always use the full four-digit year or you’ll be introducing your own, personal Year 2000 bug into your code (the year 2000 will show as “00” and sort first, for instance).
It’s possible to undertake consolidations on numeric fields (even those with decimals) by using an IIf function to group the sales totals into low, medium, and high results, as follows:
IIf([sales]<=1000,"Low", _ IIf([sales]<=3000,"Medium","High"))
Here’s the IIf statement in a SQL expression that counts the number of sales that occur in each classification (also see Table 4):
SELECT IIf([sales]<=1000,"Low", _ IIf([sales]<=3000,"Medium","High")) AS Classification, Count(salesDate) AS numSales FROM tblSalesResults GROUP BY IIf([sales]<=1000,"Low", _ IIf([sales]<=3000,"Medium","High"));
Table 4. Classifying numbers into separate categories.
Classification | numSales |
High | 32 |
Low | 181 |
Medium | 86 |
You can achieve the same result as the IIf expression by converting the expression into a function that you write yourself. Here’s a function that might replace an IIf expression, along with an example of how it would be used in a SQL statement:
Function FXClassify (VarSalesVal As Variant) As String Select Case VarSalesVal Case Is <= 1000 FXClassify = "Low" Case 1000 To 3000 FXClassify = "Medium" Case Is > 3000 FXClassify = "High" Case Else FXClassify = "Unknown" End Select Exit Function SELECT FXClassify([sales]) AS Classification, Count(salesDate) AS numSales FROM tblSalesResults GROUP BY FXClassify([sales]);
Once you start writing your own routines, you can create classification schemes of any complexity that you want.
SELECT productName, Avg(CDbl([salestext])) AS avgSales FROM TblSalesResults GROUP BY productName
Aggregated values
The normal aggregate functions used on numbers are Average, Sum, Standard Deviation, and Variance. It should be noted that these only apply to records that have an actual value in the field being aggregated. If the field is null, then it’s not included in the aggregate total. If you have a field that’s designated as a string field but only has numerical results in it, you can compute the numerical aggregate by enclosing the field in the Access function cDbl.
Be warned that cDbl won’t handle null values.
You can do more with the aggregate functions by adding expressions to the query. This next statement shows how to calculate a weighted average. In this example, the value being averaged is the sales field weighted by the value of the data in the budget field. For this expression, I’ve multiplied together the two fields in each row to form a product and then summed the results. At the same time, this query also calculates the total of all budget fields. The weighted average is returned as the division of the two sums:
SELECT productName, Sum([sales]*[budgets])/Sum([budgets]) AS WgtAvgSales FROM tblSalesResults GROUP BY productName;
Notice that the division of the two aggregate amounts is done when the record is added to the result. In other words, for each record, the (sales * budget) product is calculated and summed (as is the budget field). Only when the Group By fields change and a record is produced is the division actually performed. This is because the result of an aggregate function is only available for use when a record is added to the query’s result.
SELECT productName, Sum([sales]*[budgets]/[budgets]) AS WgtAvgSales FROM tblSalesResults GROUP BY productName;
If you actually wanted to get the sum of sales * budgets/budgets, you’d need to use this query:
You can combine aggregate results in a query in other ways. The expression in this example uses the results from two of the other aggregate fields. When you start using expressions in queries, it’s important to come up with a query column naming convention, as debugging the formula in the queries can be difficult:
SELECT DISTINCTROW productName, Sum(sales) as SalesTot, Sum(budgets) as BudgetTot, [SalesTot]-[BudgetTot] as SalesPerf FROM tblSalesResults GROUP BY productName;
Cumulative expressions
Access has no built-in function to calculate running totals in a query. There’s a technique that you can use in a query, however, that will allow you to create a column that represents the total of all the previous rows in the query. Table 5 shows the results of a query that returns totals by day for the different regions in the tblSalesResults table, plus the total sales up to that date.
Table 5. A query with a cumulative column.
Region | ProductName | salesDate | Sales | RsumSales |
Sydney | Cola | 14/01/95 | 4000 | 4000 |
Sydney | Cola | 28/01/95 | 4000 | 8000 |
Sydney | Cola | 14/02/95 | 4260 | 12260 |
Sydney | Cola | 28/02/95 | 4260 | 16520 |
Sydney | Cola | 14/03/95 | 3800 | 20320 |
Sydney | Cola | 28/03/95 | 3800 | 24120 |
The key is a function that, unfortunately, is unique to each query and must be set up on an individual basis. However, if you need cumulative output, this is an option. I consider this another way of consolidating data (and the reason this article isn’t titled “All About Totals Queries”). Besides, the function’s code uses the DSum consolidation function to provide the aggregate total.
Queries that use this technique are slow but require far less code than a comparable RecordSet solution written in Access basic. Here’s the SQL:
SELECT DISTINCTROW region, productName, salesDate, sales,RSum_FX([salesDate],[productName],[region]) AS RSumSales FROM tblSalesResults ORDER BY region, productName, salesDate;
By using the Distinct operator, the SQL statement produces one record for every combination of the non-aggregate fields (just like a Totals query). The Order By clause ensures that the records are also sorted by those columns (again, like a Totals query). However, instead of using one of the built-in aggregate operators, I wrote my own. Here it is:
Function RSum_FX (dateVal As Variant, _ ProductVal As Variant, regionVal As Variant) As Variant Dim whereStr As String If IsNull(dateVal) Then RSum_FX = Null Else whereStr = "clng(salesDate) <= " & CLng(dateVal) & _ " and productName = '" & ProductVal & _ "' and region = '" & regionVal & "' " RSum_FX = DSum("[Sales]", "[tblSalesResults]", whereStr) End If End Function
As you can see, the key part of this function is the building of the Where string for the DSum function. The Where Clause must retrieve the records for the non-aggregated columns in the query. For the date handling, I found that it was easier to translate the date back to its stored integer number rather than converting the dates to a common format understandable by Access Basic (again, see Angela J.R. Jones’ article for background on the way dates are stored).
A consolidation
In addition to being used as queries, these consolidation queries can be used as the row source for Microsoft Graph objects embedded in Access forms. Since consolidation queries are typically used to analyze data, graphs provide an excellent way of displaying the results of the query so that patterns and trends can be spotted.
Knowledge of the different ways that you can use consolidation queries is quite important to those of you who aim to undertake data analysis. Initially, you must come to terms with the way that data is already aggregated in the database. Then you can move on to experiment with using functions to aggregate your data in ways not provided by the data that’s stored in the database. This will probably take you into using Access’s built-in functions, such as Format, IIF, and even on to your own custom functions. You can also use expressions in your queries to do aggregate calculations.
An extension of these techniques is the joining of queries using the values in the Group By fields. I recently used this technique in a mining application where the data had to be reported by month but was stored in separate daily, weekly, and monthly results tables. Rather than designing a system that would use VBA and recordset manipulation, my solution used joins between consolidation queries. Not understanding consolidation queries would have resulted in a more complicated (and more expensive) solution for the client.
Your download file is called Robinson_Consolidating_Data.accdb
Sidebar: Foundation Concepts
Here’s an introduction to the SQL consolidation functions that Access supports. As part of this quick intro, I’ve included some “watch-outs” in the way that these functions deal with Nulls:
- Count — When used with an * (for example, Count(*)), Count returns the number of records found by the query. When Count is used with a field name (for instance, Count([Sales])), the function counts the number of records that don’t have Null in that field. On the other hand, when used with an asterisk, Count includes records even if all of the fields of the record are null (when used with an asterisk, Count takes advantage of Access Rushmore technology when running against Jet databases to get you faster results). Count will never return a Null: If no records are found or a field contains nothing but Nulls, Count returns 0.
- Sum, Avg, StDev, Var — These provide the total, average, standard deviation, and variance of all of the values in a field. All of these functions ignore records with nulls in the field being aggregated. If all of the fields are null, these functions return null (unlike Count, which returns 0). These functions only work with numeric values (including Dates, Boolean, and so forth).
- Min, Max — These are the smallest and largest value in the field being aggregated. These functions work with non-numeric values (for example, Text, Char, VarChar). Like the numeric functions, these functions ignore nulls and return null if all the fields are null.
- First, Last — These are the values of the aggregated field from the first and last record physically added to the database. These functions work with non-numeric values (for instance, Text, Char, VarChar). Unlike the numeric functions, these functions don’t ignore nulls and return null if the field in the appropriate record is null.