Consolidating Your Data with Queries

Here’s a comprehensive look at how you can use consolidation queries in Access. And we’re not just talking about how to use Group By, either.

Microsoft Access’s Query/SQL environment, along with its support of functions and complex expressions, makes Access a powerful tool for consolidating data. Access’s ability to work with many different database formats also gives you the ability to explore data in many different data sources. As I discovered, there’s more to consolidating data than clicking the Totals button in the Query Design window, and I’m going to share that information with you in this article.

The demonstration database

I’ve included a sample database called Totals.mdb in the accompanying Download file. Inside that database is a table called TblSalesResults and all the queries and functions that make up this article. You can try out the queries in this exercise yourself or generate them using the shareware data mining tool Graf-FX that I developed. Table 1 shows a sample of the data that the queries will use. The data consists of the sales and budgeted results for three soft drink products in different regions of Australia.

Table 1. A sample of the data in the tblSalesResults table.

Product Name	SalesDate	State	Region	Sales	Budgets
Orange	14/01/95	NSW	Goulburn	220	240
Lemonade	14/01/95	NSW	Sydney	940	1000
Orange	14/01/95	NSW	Sydney	1000	960
Cola	14/01/95	NSW	Sydney	4000	3600

Consolidating

I’ll begin with a quick vocabulary lesson to make sure that we’re all speaking the same language for the rest of this article. If I were to set up a consolidation query on the tblSalesResults table to show the sales results by product and region, I’d have the following SQL query, whose sample results are shown in Table 2 (if you want to see how this query looks in the Query Design window, open the demonstration database and look at query qTot_Listing2):

SELECT productName, region,        Group By columns

Sum(sales) AS totSales             Aggregated values

FROM TblSalesResults               Table/Query source

WHERE ((state="NSW")               Where clause

GROUP BY productName, region;      Group By columns

HAVING ((totSales > 1000)          Having clause

ORDER BY Sum(sales);               Order By clause

Table 2. Sample output for a consolidation query.

ProductName	Region	totSales
Cola	Goulburn	12160
Cola	Sydney	82200
Lemonade	Goulburn	3700
Lemonade	Sydney	17858

I break consolidation queries up into five different components:

Group By columns — These are the columns that you want to consolidate on. Choosing more than one Group By column will create a row in your query, every combination of the data found in these columns.
Aggregated values — If a column isn’t being grouped on, it must be subject to an aggregate function. These aggregate functions fall into two groups: those that require a number field (Sum, Avg, StDev, Var), and those that can also work on text data (Count, Min, Max, First, Last). For detailed information on these functions, see the Foundation Concepts
Table/Query source — This is the table or query where the data is derived from.
Where clause — This is the criteria that restrict the rows of data that are being analyzed.
Order By — You can sort the results of the query by any output field or fields in the query. If you don’t select a column, the query will be sorted by the Group By fields.
Having — You can select which consolidated records to display by using a Having clause to select records based on the results of the consolidation.

Exceptions to these rules include the use of expressions to combine fields and multiple table/query sources where the sources are joined on common fields. The remainder of this article discusses ways that you can achieve more with your consolidation queries.

Exploiting the Group By columns

Determining the level of consolidation is important. Date-related information is one area where it would be unusual to analyze results by simply consolidating on a date field. After all, one of the purposes of consolidation is to reduce the level of detail, and summarizing by the date of a transaction doesn’t eliminate much detail. So, you’ll usually want to look at weekly, monthly, or quarterly results, even though the actual information is stored with one or more entries per day.

As discussed in the article by Angela J.R. Jones, “Saving Time with Date/Time Functions” in the June 1998 issue of Smart Access, there are a number of functions that extract years, months, and so on from dates. However, if you need a custom date, the Format function works best. Using the Format function, this SQL produces the output you see in Table 3:

SELECT DISTINCTROW Format([SalesDate],"yyyy-mm") _

    AS SalesMonth,

Sum(sales) AS TotSales

FROM tblSalesResults

GROUP BY Format([SalesDate],"yyyy-mm");

Table 3. The results of a query that formats date data.

SalesMonth	TotSales
1995-01	31560
1995-02	33340
1995-03	31584
1995-04	33358

By consolidating the dates into months, you can now start analyzing the data to look for possible patterns.

This example shows two important things about working with dates. To begin with, the years are shown first, followed by the months as numbers. This guarantees that the output will be sorted sequentially. Also, when you start building these functions, always use the full four-digit year or you’ll be introducing your own, personal Year 2000 bug into your code (the year 2000 will show as “00” and sort first, for instance).

It’s possible to undertake consolidations on numeric fields (even those with decimals) by using an IIf function to group the sales totals into low, medium, and high results, as follows:

IIf([sales]<=1000,"Low", _

  IIf([sales]<=3000,"Medium","High"))

Here’s the IIf statement in a SQL expression that counts the number of sales that occur in each classification (also see Table 4):

SELECT IIf([sales]<=1000,"Low", _

 IIf([sales]<=3000,"Medium","High")) AS Classification,

Count(salesDate) AS numSales

FROM   tblSalesResults

GROUP BY IIf([sales]<=1000,"Low", _

 IIf([sales]<=3000,"Medium","High"));

Table 4. Classifying numbers into separate categories.

Classification	numSales
High	32
Low	181
Medium	86

You can achieve the same result as the IIf expression by converting the expression into a function that you write yourself. Here’s a function that might replace an IIf expression, along with an example of how it would be used in a SQL statement:

Function FXClassify (VarSalesVal As Variant) As String

  Select Case VarSalesVal

    Case Is <= 1000

      FXClassify = "Low"

    Case 1000 To 3000

      FXClassify = "Medium"

    Case Is > 3000

      FXClassify = "High"

    Case Else

      FXClassify = "Unknown"

End Select

Exit Function



SELECT FXClassify([sales]) AS Classification,

Count(salesDate) AS numSales

FROM    tblSalesResults

GROUP BY FXClassify([sales]);

Once you start writing your own routines, you can create classification schemes of any complexity that you want.

SELECT productName, Avg(CDbl([salestext])) AS avgSales

FROM TblSalesResults

GROUP BY productName

Aggregated values

The normal aggregate functions used on numbers are Average, Sum, Standard Deviation, and Variance. It should be noted that these only apply to records that have an actual value in the field being aggregated. If the field is null, then it’s not included in the aggregate total. If you have a field that’s designated as a string field but only has numerical results in it, you can compute the numerical aggregate by enclosing the field in the Access function cDbl.

Be warned that cDbl won’t handle null values.

You can do more with the aggregate functions by adding expressions to the query. This next statement shows how to calculate a weighted average. In this example, the value being averaged is the sales field weighted by the value of the data in the budget field. For this expression, I’ve multiplied together the two fields in each row to form a product and then summed the results. At the same time, this query also calculates the total of all budget fields. The weighted average is returned as the division of the two sums:

SELECT productName,

 Sum([sales]*[budgets])/Sum([budgets]) AS WgtAvgSales

FROM tblSalesResults

GROUP BY productName;

Notice that the division of the two aggregate amounts is done when the record is added to the result. In other words, for each record, the (sales * budget) product is calculated and summed (as is the budget field). Only when the Group By fields change and a record is produced is the division actually performed. This is because the result of an aggregate function is only available for use when a record is added to the query’s result.

SELECT productName,

 Sum([sales]*[budgets]/[budgets]) AS WgtAvgSales

FROM tblSalesResults

GROUP BY productName;

If you actually wanted to get the sum of sales * budgets/budgets, you’d need to use this query:

You can combine aggregate results in a query in other ways. The expression in this example uses the results from two of the other aggregate fields. When you start using expressions in queries, it’s important to come up with a query column naming convention, as debugging the formula in the queries can be difficult:

SELECT DISTINCTROW productName,

Sum(sales) as SalesTot, Sum(budgets) as BudgetTot,

 [SalesTot]-[BudgetTot]  as  SalesPerf

FROM tblSalesResults

GROUP BY productName;

Cumulative expressions

Access has no built-in function to calculate running totals in a query. There’s a technique that you can use in a query, however, that will allow you to create a column that represents the total of all the previous rows in the query. Table 5 shows the results of a query that returns totals by day for the different regions in the tblSalesResults table, plus the total sales up to that date.

Table 5. A query with a cumulative column.

Region	ProductName	salesDate	Sales	RsumSales
Sydney	Cola	14/01/95	4000	4000
Sydney	Cola	28/01/95	4000	8000
Sydney	Cola	14/02/95	4260	12260
Sydney	Cola	28/02/95	4260	16520
Sydney	Cola	14/03/95	3800	20320
Sydney	Cola	28/03/95	3800	24120

The key is a function that, unfortunately, is unique to each query and must be set up on an individual basis. However, if you need cumulative output, this is an option. I consider this another way of consolidating data (and the reason this article isn’t titled “All About Totals Queries”). Besides, the function’s code uses the DSum consolidation function to provide the aggregate total.

Queries that use this technique are slow but require far less code than a comparable RecordSet solution written in Access basic. Here’s the SQL:

SELECT DISTINCTROW region, productName, salesDate,

sales,RSum_FX([salesDate],[productName],[region])

  AS RSumSales

FROM tblSalesResults

ORDER BY region, productName, salesDate;

By using the Distinct operator, the SQL statement produces one record for every combination of the non-aggregate fields (just like a Totals query). The Order By clause ensures that the records are also sorted by those columns (again, like a Totals query). However, instead of using one of the built-in aggregate operators, I wrote my own. Here it is:

Function RSum_FX (dateVal As Variant, _

 ProductVal As Variant, regionVal As Variant)  As Variant


Dim whereStr As String

If IsNull(dateVal) Then

  RSum_FX = Null

Else

  whereStr = "clng(salesDate) <= " & CLng(dateVal) & _

    " and productName = '" & ProductVal & _

    "'  and region = '" & regionVal & "' "

  RSum_FX = DSum("[Sales]", "[tblSalesResults]", whereStr)

End If



End Function

As you can see, the key part of this function is the building of the Where string for the DSum function. The Where Clause must retrieve the records for the non-aggregated columns in the query. For the date handling, I found that it was easier to translate the date back to its stored integer number rather than converting the dates to a common format understandable by Access Basic (again, see Angela J.R. Jones’ article for background on the way dates are stored).

A consolidation

In addition to being used as queries, these consolidation queries can be used as the row source for Microsoft Graph objects embedded in Access forms. Since consolidation queries are typically used to analyze data, graphs provide an excellent way of displaying the results of the query so that patterns and trends can be spotted.

Knowledge of the different ways that you can use consolidation queries is quite important to those of you who aim to undertake data analysis. Initially, you must come to terms with the way that data is already aggregated in the database. Then you can move on to experiment with using functions to aggregate your data in ways not provided by the data that’s stored in the database. This will probably take you into using Access’s built-in functions, such as Format, IIF, and even on to your own custom functions. You can also use expressions in your queries to do aggregate calculations.

An extension of these techniques is the joining of queries using the values in the Group By fields. I recently used this technique in a mining application where the data had to be reported by month but was stored in separate daily, weekly, and monthly results tables. Rather than designing a system that would use VBA and recordset manipulation, my solution used joins between consolidation queries. Not understanding consolidation queries would have resulted in a more complicated (and more expensive) solution for the client.

Your download file is called Robinson_Consolidating_Data.accdb

Sidebar: Foundation Concepts

Here’s an introduction to the SQL consolidation functions that Access supports. As part of this quick intro, I’ve included some “watch-outs” in the way that these functions deal with Nulls:

Count — When used with an * (for example, Count(*)), Count returns the number of records found by the query. When Count is used with a field name (for instance, Count([Sales])), the function counts the number of records that don’t have Null in that field. On the other hand, when used with an asterisk, Count includes records even if all of the fields of the record are null (when used with an asterisk, Count takes advantage of Access Rushmore technology when running against Jet databases to get you faster results). Count will never return a Null: If no records are found or a field contains nothing but Nulls, Count returns 0.
Sum, Avg, StDev, Var — These provide the total, average, standard deviation, and variance of all of the values in a field. All of these functions ignore records with nulls in the field being aggregated. If all of the fields are null, these functions return null (unlike Count, which returns 0). These functions only work with numeric values (including Dates, Boolean, and so forth).
Min, Max — These are the smallest and largest value in the field being aggregated. These functions work with non-numeric values (for example, Text, Char, VarChar). Like the numeric functions, these functions ignore nulls and return null if all the fields are null.
First, Last — These are the values of the aggregated field from the first and last record physically added to the database. These functions work with non-numeric values (for instance, Text, Char, VarChar). Unlike the numeric functions, these functions don’t ignore nulls and return null if the field in the appropriate record is null.

vb123.com

Consolidating Your Data with Queries

The demonstration database

Consolidating

Exploiting the Group By columns

Aggregated values

Cumulative expressions

A consolidation

Your download file is called Robinson_Consolidating_Data.accdb

Sidebar: Foundation Concepts

About Garry Robinson

Leave a Reply Cancel reply

AccessUserGroups.org

Recently added

About IT Impact

Follow Us

Like Us

vb123.com

Consolidating Your Data with Queries

The demonstration database

Consolidating

Exploiting the Group By columns

Aggregated values

Cumulative expressions

A consolidation

Your download file is called Robinson_Consolidating_Data.accdb

Sidebar: Foundation Concepts

Spread the love

About Garry Robinson

Leave a Reply Cancel reply

AccessUserGroups.org

Recently added

About IT Impact

Follow Us

Like Us