An Access Project, Part 2: Data Modeling

In this series of articles, David walks through the process of creating a complete Access application. While the application itself is a valuable tool in a developer’s toolkit, David will concentrate in each article on using some feature of Microsoft Access.

Start The Series Here: An Access Project, Part 1: Requirements

When I was a kid, I loved to build models (actually, I still do). I always found it great to take a bunch of plastic pieces and end up with a finished product that looked (hopefully) like an item from real life. Data modeling for my Access applications provides me with the same satisfaction. In a nutshell, data modeling is the design of the informational components of the database — that is, the tables, the relationships, and so forth. Like my more physical modeling, data modeling is the act of taking the pieces (the data items) and putting them together in a way that reflects reality.

The first step in data modeling is to define all the pieces. Unlike the plastic models of my bygone days, when I start to create an application, no one prepares a nice collection of pieces and a set of instructions for me. This, I must do myself.

The key here is to determine what information is required for your application. I do this by thinking in terms of the end product. While it’s tempting to try to “begin at the beginning” with the inputs of the system, I concentrate on “What do I need to have when the project is completed? What information is my user going to want?” You’d be surprised at how many inputs to existing systems turn out to be completely unnecessary.

Determining the necessary outputs of your system can be difficult, but the more complete your data model is, the fewer hassles maintenance of the application is going to cause. As recommended by Markus Bosshard in the October 1998 issue of Smart Access (see “Data Modeling for Access Developers”), it’s important that you start here, at the functional layer. Don’t think about the logical model — data relations, the linking of tables, and so on — that comes later. The key here is to develop as complete a list of data items required as possible.

The sample application for this series of articles, which is designed to track employee time, provides some examples of how I would proceed. Any report on employee time spent on projects is going to require employee and project information. So, obviously, my customers are going to need an Employee table and a Project table. Table 1 shows some possible data items for the Employee table.

Table 1. The Employee table.

Item Name Description
1. Last Name Employee’s last name
2. First Name Employee’s first name
3. Address Employee’s street address
4. City Employee’s city
5. Postal Code Employee’s postal/ZIP code
6. Phone Number Employee’s phone number
7. Start Date Employee’s start date
8. Sex Employee’s gender
9. Bill Rate The rate at which the employee is billed to projects/customers

Obviously, there’s a lot more information that could be added to the Employee table, but you need to look at your information requirements within the scope of your project. What information do you need for your application? If you’re just going to track employees onto projects, then the employee’s address information, phone number, and gender are irrelevant to us. In certain cases, where a system is being developed modularly, it’s a good idea to give consideration to information that your module might not need but that other modules might require. Adding this information to your table now could save you time and headaches in the future. For our purposes, we’ll only look at information that’s relevant to our project. As a result, our data table ends up looking like Table 2.

Table 2. The revised Employee table.

Item Name Description
1. Last Name Employee’s last name
2. First Name Employee’s first name
3. Start Date Employee’s start date
4. Bill Rate The rate at which the employee is billed to projects/customers

Once the data elements have been determined, then their characteristics can be defined (you don’t have to at this point, but it’s as good a time as any). Give consideration to what’s required by your users. Ensure that field lengths are long enough to handle the nature of the data. Some fields, such as dates, are a given, but alpha and numeric fields require some thought. How many characters do I need? How many decimals? In many cases, you can take the defaults that Access provides, but in designing screens down the road, you might not want to deal with a 50-character Employee First Name. Variable naming can also be taken into account at this point. Personally, I like the Redick VBA Naming conventions. Table 3 shows how the data layout looks after some basic attributes have been applied.

Table 3. The completed Employee table.

Item Variable name Type Size Comments
First Name strFirstName String 25
Last Name strLastName String 30
Start Date datStartDate Date 6 Format YYMMDD
Bill Rate lngBillRate Long 5,2 Format ZZZZ9.99

Normalize, normalize, normalize

It’s always a good idea to go back to your requirements document to review what you’re trying to do (the requirements analysis for this project can be found in “An Access Project: Requirements” in the January 1999 issue). The requirement that I want to consider now is the one to track hours spent on projects by employees. To do that, I need to track the project name, who worked on it, how many hours they spent, and what they were working on. Table 4 is a sample of how the data might look.

Table 4. The Project table.

Project name Project duration Project leader Employee Task Hours
Year 2000 6 months Irvine Irvine Analysis 4
Year 2000 6 months Irvine Irvine Programming 15
Year 2000 6 months Irvine Smith Analysis 5

Notice that in this layout, several data items are repeated. The goal of normalization is to weed out static information so it appears in as few places as possible. By using a separate table for Project Information and Project Assignments, some of this duplication is prevented (see Table 5 and Table 6).

Table 5. Revised project table.

Project name Project leader Project duration
Year 2000 Irvine 6 months
Internet Development Smith 3 months
HR Database Jones 4 months

Table 6. The Project Assignment table.

Project name Employee Task Hours
Year 2000 Irvine Analysis 4
Year 2000 Irvine Programming 15
Year 2000 Smith Analysis 5

I still haven’t eliminated all duplication, since the employee name is still repeated several times in the Project Task table. I’ll take a further look at this later on.

The key’s the thing

As you can see from the data, it’s pretty vanilla. One task that I haven’t undertaken yet is assigning primary keys. Basically, a key is a field (or fields) on which you wish to index the data, to assist in organizing and accessing the records. There are two types of keys: primary and secondary. In any Access table, you must define a primary key, which is unique for all records in the table. Secondary keys are permitted to include records with duplicate values (for example, Last Name), and you should define one index for every field that you might search or sort on frequently. If we define a unique key for our Employee table, say Employee Number, then the Employee table looks something like Table 7.

Table 7. The Employee table with primary key.

Item Name Description
1. Employee Number Unique Employee Number (Primary Key)
2. Last Name Employee’s last name
3. First Name Employee’s first name
4. Start Date Employee’s start date
5. Bill Rate The rate at which the employee is billed to projects/customers

By using a unique key, you now have a fast way to retrieve or sort data when you’re using the Employee Number. When the Employee table is linked to another table using the Employee Number, this index will speed up the join.

The impossible join

One of the glories of a database system is the ability to relate data from one table to data from another. The relationships that can exist are:

  • One-to-one — One data item in a table relates to one and only one data item in the related table.
  • One-to-many — One data item in a table relates to multiple data items in the related table.
  • Many-to-many — Multiple data items in one table are related to multiple data items in the related table.

One-to-one relationships are relatively unusual — after all, why not just combine the records into one table? Generally, one-to-one relationships are used when tables from different systems are being joined or when data is broken out into a separate table for security purposes. One-to-many relationships are the typical relationship between tables.

In fact, one of the basic rules of the relational database theory is that a many-to-many relationship can’t be implemented. Reality insists that in the world, many to many relationships do exist. In our sample system, many people work on many projects, for example.

Resolving many-to-many relationships consists of considering the problem in different terms. Each person works on multiple projects, which is a one-to-many join. Each project has many people who work on it, which is also a one-to-many relationship. In order to create the many-to-many relationship, all that’s necessary is to introduce an intervening table between employees and projects. In my example, this is the Project Assignment table, as shown in Table 6.

The Project Assignment table is the junction or intersection table between the Employee table and the Project table. Each employee has many Project Assignment records (one for every project that he or she is on), but every Project Assignment record has only one Employee record. Each Project record has many Project Assignment records (one for every employee assigned), but each Project Assignment record has only one Project record. By combining these two one-to-many relationships, I’ve effectively created a many-to-many relationship between Project and Employee records.

In general, a many-to-many relationship is created by adding a new table whose primary key consists of the primary keys from the two tables being joined. The Project Assignment table’s primary key, for instance, should consist of the Employee Number and the Project Name. The usual situation is that there’s some additional data that can only be stored in the junction table. The Project Assignment table, for instance, holds the hours spent on each project by each employee, which couldn’t be stored in either the Employee table or the Project table.

One key item to consider when looking at data relationships is that when data is interrelated, modifications to one record might affect data in another table. If an Employee record is deleted (the employee has left the company), what happens to that employee’s records in the Project Assignment table? Are they required for audit purposes? Can they be deleted? This concept is referred to as a “cascading” transaction, where modification of one data item is “cascaded” to the related records.

So if you delete the Employee record, should all the corresponding Project Assignment records for the employee be deleted? If deleting these records would prevent meeting an audit requirement, you should just turn on the relationship’s Enforce Referential Integrity option. This setting will prevent an Employee record from being deleted unless there are no Project Assignment records for the employee. It will also prevent users from adding a Project Assignment record for an employee number that isn’t in the Employee table. By using relationships and Enforce Referential Integrity, you get Jet to ensure that you never have a Project Assignment record for an employee who doesn’t exist.

On the other hand, if you don’t need to provide an audit trail, you might want to have Project Assignment records deleted when an Employee record is deleted. To make this happen, you only need to set the Cascading Delete option of the relationship to on. You still won’t be able to add a Project Assignment record for an invalid Employee record, but deleting an Employee record will cascade down and delete the related Project Assignment records.

Another nice feature of the relationship window is the Cascade Update option. One of the problems with using primary keys to link tables is that sometimes the data in the linking fields change. It’s not impossible, for instance, that we might want to change the name of one of our projects. If I do change a Project Name, I’d also have to be sure to change the corresponding records in the Project Assignment table. The Cascade Update option takes care of that, so that when I change a Project Name in the Project table, the related records in the Project Assignment table are also changed.

Data modeling is an important first step in development of any database. Many developers do their modeling online while in Access, and this is fine, provided that some prior thought is given to what data is required, how the data is stored, and what the relationship between the data is. In my next article, I’m going to step back and look at creating simple reports and how to use them to ensure that your proposed data model is the right one. Until then, happy modeling.

Now, where did I put that airplane kit and my crazy glue?

About David Irvine

Dave is an IT trainer and consultant who continues to teach Access and Visual Basic. He is principal consultant for IDS Solutions, and is the lead instructor and program facilitator for the Information Technology Professional program at Lambton College in Canada.
This entry was posted in Design and Tables and tagged . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.