Microsoft Access Office VB VBA Help and Examples

How to Normalize your data - Concept

So what's all this about Normalizing my data, and not using Flat file format that I keep hearing about. This section will give you a no nonsense explanation of what this all means for the absolute beginner.

What's a flat file? Think of an excel spreadsheet. You may have a sheet containing details of the number of hours someone has worked on a particular project each week or month. In excel it may be laid out like this.

Person Department Project 1 Jan Project 2 Jan Project 3 Jan Project 1 Feb Project 2  Feb Project 3  Feb
John Design 64 3 16 7 14 23
Fred Sales 4 2 5 12 0 3
Sarah Admin 54 6 13 19 8 10
James Design 23 7 13 8 3 13

This is a flat format, and at one time every body has probably used it until the point when they got to December and thought this sheet is getting a bit wide, I know I'll start a second sheet in Excel and call it 2001. All very inefficient. What happens if someone says what percentage of hours was worked by the Design team during Feb on Project 2 you would calculate it by hand. Not clever or impressive.

You could also put the data into Access in exactly the same format in a table. It would not be any better. It would be just as useless and time consuming. Lets normalize it.

What's Normalization? Think of it as a way of splitting your data into components. Each component links to other related components. This is what gives us our relational database design. 

Why would we want to split stuff up? What happens five years down the line when you have 500 staff  and 2000 projects. Your spreadsheet is very unwieldy now. 

How do we Normalize our Data? Look at the sheet above what components do we have? There are three immediate ones. Staff, Projects, and Hours worked. So lets split them into three tables.

tbl_Staff  Contains our staff personnel info. 

FirstName LastName Address StaffID Level Department
John Hawkins Somewhere Close 1 President Design
Fred Bloggs Anywhere Place 2 Vice President Sales
Sarah Hawkins Thingy Street 3 Assistant Vice President Admin
James Khan Wotsit Road 4 DBA Design

tbl_Projects Contains all the details about the projects

ProjectName Company ProjectID CompleteDate Cost Comments
WebSite Fabalou Web 1 23/12/2000 $1000 In progress
FireFox2 BAA 2 14/3/2000 $32,000 In bug test
WhiteSnake Epic 3 5/6/2000 $21 On back burner

tbl_HoursWorked This is the linking table between staff and projects and contains one record for each person per month per project

StaffID ProjectID Month Hours
1 1 Jan 64
2 1 Jan 4
3 1 Jan 54
4 1 Jan 23
1 2 Jan 3
2 2 Jan 2
3 2 Jan 6
4 2 Jan 7
1 3 Jan 16
2 3 Jan 5
3 3 Jan 13
4 3 Jan 13
1 1 Feb 7
2 1 Feb 12
3 1 Feb 19
4 1 Feb 8
1 2 Feb 14
2 2 Feb 0
3 2 Feb 8
4 2 Feb 3
1 3 Feb 23
2 3 Feb 3
3 3 Feb 10
4 3 Feb 13

Looks like a lot of hard work for nothing doesn't it. But think now if you want to add a new project. You just add a single project record to the project table. In excel you would have to insert a new column for each month. Lets go back to the problem of  percentage of time spent on Project 2 by Design team in Feb. You would create a new query and pull in all three tables. Linking the StaffID and ProjectID. Set criteria to show only Feb stuff and only design team. Then group by the data and sum it. You could get the query to ask for the group and month as it opens so next time someone asks for those details you can pull them up in a matter of seconds.

This is my first attempt at explaining normalization if you didn't understand it then please drop me an email so we can discuss it and make more sense of it. It's not the easiest thing to understand why it's done, but bear with it and a couple on months of it and you'll wonder how you ever lived without it.

So now you know the concept. How do you actually do it for thousands of records???
Normalization - How To is a follow on from this document and shows how to actual bulk transfer the data

The code and application content of this site is copyright of Smiley I.T. and as such reproduction in any form which is for commercial use requires the permission of the Webmaster. Any use of this code for non-commercial use only requires a link or comment back to the original page you took the code from.