Consolidating Your Data with Queries

Here’s a comprehensive look at how you can use consolidation queries in Access. And we’re not just talking about how to use Group By, either.

Microsoft Access’s Query/SQL environment, along with its support of functions and complex expressions, makes Access a powerful tool for consolidating data. Access’s ability to work with many different database formats also gives you the ability to explore data in many different data sources. As I discovered, there’s more to consolidating data than clicking the Totals button in the Query Design window, and I’m going to share that information with you in this article.

The demonstration database

I’ve included a sample database called Totals.mdb in the accompanying Download file. Inside that database is a table called TblSalesResults and all the queries and functions that make up this article. You can try out the queries in this exercise yourself or generate them using the shareware data mining tool Graf-FX that I developed. Table 1 shows a sample of the data that the queries will use. The data consists of the sales and budgeted results for three soft drink products in different regions of Australia.

Table 1. A sample of the data in the tblSalesResults table.

Product Name SalesDate State Region Sales Budgets
Orange 14/01/95 NSW Goulburn 220 240
Lemonade 14/01/95 NSW Sydney 940 1000
Orange 14/01/95 NSW Sydney 1000 960
Cola 14/01/95 NSW Sydney 4000 3600

Consolidating

I’ll begin with a quick vocabulary lesson to make sure that we’re all speaking the same language for the rest of this article. If I were to set up a consolidation query on the tblSalesResults table to show the sales results by product and region, I’d have the following SQL query, whose sample results are shown in Table 2 (if you want to see how this query looks in the Query Design window, open the demonstration database and look at query qTot_Listing2):

SELECT productName, region,        Group By columns

Sum(sales) AS totSales             Aggregated values

FROM TblSalesResults               Table/Query source

WHERE ((state="NSW")               Where clause

GROUP BY productName, region;      Group By columns

HAVING ((totSales > 1000)          Having clause

ORDER BY Sum(sales);               Order By clause

Table 2. Sample output for a consolidation query.

ProductName Region totSales
Cola Goulburn 12160
Cola Sydney 82200
Lemonade Goulburn 3700
Lemonade Sydney 17858

I break consolidation queries up into five different components:

  • Group By columns — These are the columns that you want to consolidate on. Choosing more than one Group By column will create a row in your query, every combination of the data found in these columns.
  • Aggregated values — If a column isn’t being grouped on, it must be subject to an aggregate function. These aggregate functions fall into two groups: those that require a number field (Sum, Avg, StDev, Var), and those that can also work on text data (Count, Min, Max, First, Last). For detailed information on these functions, see the Foundation Concepts
  • Table/Query source — This is the table or query where the data is derived from.
  • Where clause — This is the criteria that restrict the rows of data that are being analyzed.
  • Order By — You can sort the results of the query by any output field or fields in the query. If you don’t select a column, the query will be sorted by the Group By fields.
  • Having — You can select which consolidated records to display by using a Having clause to select records based on the results of the consolidation.

Exceptions to these rules include the use of expressions to combine fields and multiple table/query sources where the sources are joined on common fields. The remainder of this article discusses ways that you can achieve more with your consolidation queries.

Exploiting the Group By columns

Determining the level of consolidation is important. Date-related information is one area where it would be unusual to analyze results by simply consolidating on a date field. After all, one of the purposes of consolidation is to reduce the level of detail, and summarizing by the date of a transaction doesn’t eliminate much detail. So, you’ll usually want to look at weekly, monthly, or quarterly results, even though the actual information is stored with one or more entries per day.

As discussed in the article by Angela J.R. Jones, “Saving Time with Date/Time Functions” in the June 1998 issue of Smart Access, there are a number of functions that extract years, months, and so on from dates. However, if you need a custom date, the Format function works best. Using the Format function, this SQL produces the output you see in Table 3:

SELECT DISTINCTROW Format([SalesDate],"yyyy-mm") _

    AS SalesMonth,

Sum(sales) AS TotSales

FROM tblSalesResults

GROUP BY Format([SalesDate],"yyyy-mm");

Table 3. The results of a query that formats date data.

SalesMonth TotSales
1995-01 31560
1995-02 33340
1995-03 31584
1995-04 33358

By consolidating the dates into months, you can now start analyzing the data to look for possible patterns.

This example shows two important things about working with dates. To begin with, the years are shown first, followed by the months as numbers. This guarantees that the output will be sorted sequentially. Also, when you start building these functions, always use the full four-digit year or you’ll be introducing your own, personal Year 2000 bug into your code (the year 2000 will show as “00” and sort first, for instance).

It’s possible to undertake consolidations on numeric fields (even those with decimals) by using an IIf function to group the sales totals into low, medium, and high results, as follows:

IIf([sales]<=1000,"Low", _

  IIf([sales]<=3000,"Medium","High"))

Here’s the IIf statement in a SQL expression that counts the number of sales that occur in each classification (also see Table 4):

SELECT IIf([sales]<=1000,"Low", _

 IIf([sales]<=3000,"Medium","High")) AS Classification,

Count(salesDate) AS numSales

FROM   tblSalesResults

GROUP BY IIf([sales]<=1000,"Low", _

 IIf([sales]<=3000,"Medium","High"));

Table 4. Classifying numbers into separate categories.

Classification numSales
High 32
Low 181
Medium 86

You can achieve the same result as the IIf expression by converting the expression into a function that you write yourself. Here’s a function that might replace an IIf expression, along with an example of how it would be used in a SQL statement:

Function FXClassify (VarSalesVal As Variant) As String

  Select Case VarSalesVal

    Case Is <= 1000

      FXClassify = "Low"

    Case 1000 To 3000

      FXClassify = "Medium"

    Case Is > 3000

      FXClassify = "High"

    Case Else

      FXClassify = "Unknown"

End Select

Exit Function



SELECT FXClassify([sales]) AS Classification,

Count(salesDate) AS numSales

FROM    tblSalesResults

GROUP BY FXClassify([sales]);

Once you start writing your own routines, you can create classification schemes of any complexity that you want.

SELECT productName, Avg(CDbl([salestext])) AS avgSales

FROM TblSalesResults

GROUP BY productName

Aggregated values

The normal aggregate functions used on numbers are Average, Sum, Standard Deviation, and Variance. It should be noted that these only apply to records that have an actual value in the field being aggregated. If the field is null, then it’s not included in the aggregate total. If you have a field that’s designated as a string field but only has numerical results in it, you can compute the numerical aggregate by enclosing the field in the Access function cDbl.

Be warned that cDbl won’t handle null values.

You can do more with the aggregate functions by adding expressions to the query. This next statement shows how to calculate a weighted average. In this example, the value being averaged is the sales field weighted by the value of the data in the budget field. For this expression, I’ve multiplied together the two fields in each row to form a product and then summed the results. At the same time, this query also calculates the total of all budget fields. The weighted average is returned as the division of the two sums:

SELECT productName,

 Sum([sales]*[budgets])/Sum([budgets]) AS WgtAvgSales

FROM tblSalesResults

GROUP BY productName;

Notice that the division of the two aggregate amounts is done when the record is added to the result. In other words, for each record, the (sales * budget) product is calculated and summed (as is the budget field). Only when the Group By fields change and a record is produced is the division actually performed. This is because the result of an aggregate function is only available for use when a record is added to the query’s result.

SELECT productName,

 Sum([sales]*[budgets]/[budgets]) AS WgtAvgSales

FROM tblSalesResults

GROUP BY productName;

If you actually wanted to get the sum of sales * budgets/budgets, you’d need to use this query:

You can combine aggregate results in a query in other ways. The expression in this example uses the results from two of the other aggregate fields. When you start using expressions in queries, it’s important to come up with a query column naming convention, as debugging the formula in the queries can be difficult:

SELECT DISTINCTROW productName,

Sum(sales) as SalesTot, Sum(budgets) as BudgetTot,

 [SalesTot]-[BudgetTot]  as  SalesPerf

FROM tblSalesResults

GROUP BY productName;

Cumulative expressions

Access has no built-in function to calculate running totals in a query. There’s a technique that you can use in a query, however, that will allow you to create a column that represents the total of all the previous rows in the query. Table 5 shows the results of a query that returns totals by day for the different regions in the tblSalesResults table, plus the total sales up to that date.

Table 5. A query with a cumulative column.

Region ProductName salesDate Sales RsumSales
Sydney Cola 14/01/95 4000 4000
Sydney Cola 28/01/95 4000 8000
Sydney Cola 14/02/95 4260 12260
Sydney Cola 28/02/95 4260 16520
Sydney Cola 14/03/95 3800 20320
Sydney Cola 28/03/95 3800 24120

The key is a function that, unfortunately, is unique to each query and must be set up on an individual basis. However, if you need cumulative output, this is an option. I consider this another way of consolidating data (and the reason this article isn’t titled “All About Totals Queries”). Besides, the function’s code uses the DSum consolidation function to provide the aggregate total.

Queries that use this technique are slow but require far less code than a comparable RecordSet solution written in Access basic. Here’s the SQL:

SELECT DISTINCTROW region, productName, salesDate,

sales,RSum_FX([salesDate],[productName],[region])

  AS RSumSales

FROM tblSalesResults

ORDER BY region, productName, salesDate;

By using the Distinct operator, the SQL statement produces one record for every combination of the non-aggregate fields (just like a Totals query). The Order By clause ensures that the records are also sorted by those columns (again, like a Totals query). However, instead of using one of the built-in aggregate operators, I wrote my own. Here it is:

Function RSum_FX (dateVal As Variant, _

 ProductVal As Variant, regionVal As Variant)  As Variant


Dim whereStr As String

If IsNull(dateVal) Then

  RSum_FX = Null

Else

  whereStr = "clng(salesDate) <= " & CLng(dateVal) & _

    " and productName = '" & ProductVal & _

    "'  and region = '" & regionVal & "' "

  RSum_FX = DSum("[Sales]", "[tblSalesResults]", whereStr)

End If



End Function

As you can see, the key part of this function is the building of the Where string for the DSum function. The Where Clause must retrieve the records for the non-aggregated columns in the query. For the date handling, I found that it was easier to translate the date back to its stored integer number rather than converting the dates to a common format understandable by Access Basic (again, see Angela J.R. Jones’ article for background on the way dates are stored).

A consolidation

In addition to being used as queries, these consolidation queries can be used as the row source for Microsoft Graph objects embedded in Access forms. Since consolidation queries are typically used to analyze data, graphs provide an excellent way of displaying the results of the query so that patterns and trends can be spotted.

Knowledge of the different ways that you can use consolidation queries is quite important to those of you who aim to undertake data analysis. Initially, you must come to terms with the way that data is already aggregated in the database. Then you can move on to experiment with using functions to aggregate your data in ways not provided by the data that’s stored in the database. This will probably take you into using Access’s built-in functions, such as Format, IIF, and even on to your own custom functions. You can also use expressions in your queries to do aggregate calculations.

An extension of these techniques is the joining of queries using the values in the Group By fields. I recently used this technique in a mining application where the data had to be reported by month but was stored in separate daily, weekly, and monthly results tables. Rather than designing a system that would use VBA and recordset manipulation, my solution used joins between consolidation queries. Not understanding consolidation queries would have resulted in a more complicated (and more expensive) solution for the client.

Your download file is called Robinson_Consolidating_Data.accdb 


Sidebar: Foundation Concepts

Here’s an introduction to the SQL consolidation functions that Access supports. As part of this quick intro, I’ve included some “watch-outs” in the way that these functions deal with Nulls:

  • Count — When used with an * (for example, Count(*)), Count returns the number of records found by the query. When Count is used with a field name (for instance, Count([Sales])), the function counts the number of records that don’t have Null in that field. On the other hand, when used with an asterisk, Count includes records even if all of the fields of the record are null (when used with an asterisk, Count takes advantage of Access Rushmore technology when running against Jet databases to get you faster results). Count will never return a Null: If no records are found or a field contains nothing but Nulls, Count returns 0.
  • Sum, Avg, StDev, Var — These provide the total, average, standard deviation, and variance of all of the values in a field. All of these functions ignore records with nulls in the field being aggregated. If all of the fields are null, these functions return null (unlike Count, which returns 0). These functions only work with numeric values (including Dates, Boolean, and so forth).
  • Min, Max — These are the smallest and largest value in the field being aggregated. These functions work with non-numeric values (for example, Text, Char, VarChar). Like the numeric functions, these functions ignore nulls and return null if all the fields are null.
  • First, Last — These are the values of the aggregated field from the first and last record physically added to the database. These functions work with non-numeric values (for instance, Text, Char, VarChar). Unlike the numeric functions, these functions don’t ignore nulls and return null if the field in the appropriate record is null.

About Garry Robinson

He is the guy who originally put this Smart Access web site together. Head to www.gr-fx.com.au for more about him and his company. His main claim to fame is winning the Microsoft MVP award for services to MS Access from 2006 to 2018.
This entry was posted in Queries. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.