In the final article of his three-part series on handling spatial data, David provides an application that makes use of the toolkit built up in our previous two issues.
In the previous two articles in this series (see “Accessing Spatial Data, Part 1” in the December 1997 issue, and “Accessing Spatial Data, Part 2” in the January 1998 issue), I described several methods for storing geographical or spatial data in an Access database, together with a number of functions for processing that data. In this final article, I’ll draw together some of the issues previously raised, using a forestry application as an example. And I’ll add some more functions to your toolbox of spatial spanners and wrenches.
The example I’m using in this article is a real system for the maintenance and management of a commercial forestry. The forest is divided into Compartments and Stands, where a Stand is an individual block of trees and a Compartment contains a number of Stands. Running through the Compartments (and, potentially, the Stands as well) are Tracks, Firebreaks, and Rivers. On the map are various Tags identifying each Compartment, Stand, and Track.
My application must store this information in an Access database and produce reports giving the gross area of each Compartment or Stand. The application also calculates the net area by subtracting the area of each Track, FireBreak, or River that crosses a treed area.
Cooking the raw data
The raw CAD system output is delivered as sets of point and linear data (see the sidebar “Digital Mapping” for the problems with inputting mapped data). Here’s a typical example:
TAG, Stand-X1, 5673.5, 224687.9 TRACK, 5538.6, 227650.3, 5691.0, 227524.8, 55776.5, 227443.1 STAND, 5448.3, 227786.6, 5409.4, 227835.0, 5368.3, 227810.7
The first thing you’ll notice is that the tag (which identifies a Stand) is in no way connected to the polyline that defines the stand. This makes determining which tag goes with each item an “interesting” problem. In the CAD system itself, the user can see which item is beside each tag. This was a luxury I didn’t have when given the raw data.
I put that problem aside to concentrate on pulling off the raw data and storing it as a point, line, or area (remembering that an area is just a line that closes back on itself). I then had to undertake an initial cleaning operation. It’s never a good idea to forget that the raw data has been manually entered, and people sometimes make mistakes. So, after extracting the data, I ran through it and corrected the following common mistakes:
- Duplicated points. Frequently the same point within a polyline entity (either a line or an area) is repeated. The usual cause is that the operator forgets where the or she has gotten to and digitizes the same point twice. My first check is to test all polylines and remove a point where two adjacent points are identical. In practice, due to the problems inherent in storing floating point numbers, the two points might not be mathematically identical. As a result, I had to test whether any two points lay within a short distance of each other.
- Duplicated entities. Sometimes a complete entity will be duplicated. This might be because the operator ran over the same area or line twice, or because the map was reloaded back on top of itself (this has the interesting effect of repeating everything). Duplicated entities are awkward to remove because, like the duplicated points, they might not be mathematically identical, just very similar. Additionally, if the duplication is because the operator digitized an entity twice, you might not even have the same number of points! A simple method of dealing with this is to take the first point in each polyline and compare it to the first point in all of the other polylines to see if they’re nearly the same. If I found a close match, I would then calculate the length of the polyline (see the PolyLineLength function in the “Four Last Functions” section) and, if the two were within 2 percent of each other, I’d display it on the screen and ask the user to decide if they were duplicate entries. You can try to automate the process further, but, remembering that “identical” polylines might have been digitized clockwise the first time and counterclockwise the second time (and that they might also contain different numbers of points), user intervention is often the most pragmatic solution.
- Closed areas. An area entity must be closed. In other words, the start point and the end point must be the same. Sometimes a CAD operator might forget to close off an area, in which case you’ll have to add an additional point at the end of the area polyline that duplicates the start coordinate.
- Too much data. Where an entity has been recorded using scanning techniques rather than by a human using a digitizer, you’ll often get polylines that are far too detailed. These items end up occupying an unnecessary amount of storage space and processing time. If you have data like this, it’s often worth pruning it down by cycling through each polyline and testing the distance between adjacent points (see the PointToPointDistance function in the “Four Last Functions” section). If they’re less than a certain distance, then delete the second point and continue testing the polyline. Don’t delete the very last point, though.
Tag, you’re it
Now that I had relatively clean data, I had to return to my first problem: assigning tags to entities. I tackled the track data because it seemed the hardest. My solution was to take a track tag and compare its coordinates with each track type polyline in the data. When I found a track tag that was closer than a defined distance from a track, I assigned that polyline to the tag. To calculate that distance, I used the PointToLineDistance function.
There are two potential problems with using this routine. First, what happens if two tracks lie on top of one another? This occurs when you have two tracks merging for a short space and a tag placed adjacent to the merged portion. The only way to deal with this is to test each track tag against all of the tracks and raise an “Error: human intervention required” if more than one track is close to the tag. The second potential problem occurs when two track tags are associated with the same track — again, human intervention is required.
The next step was to identify the areas. To do this, I took each Compartment tag and tested it against each Compartment type polyline using the Inside function described in Part 1 of this series. When the function returned “True,” indicating that the tag was inside the polyline, I had found my match. The same method worked with Stand tags and Stand-type polylines. For both of these, the same potential problems with overlying areas and doubled up tags applied, requiring user intervention.
Gross and net
Now that I had the data tidied up and assigned the correct identifiers, the rest was fairly straightforward. The main requirement of the system was to derive the gross and net areas for each of the stands and compartments.
The gross area really was easy. All I needed to do was to feed each area polyline into the Area function that I described in Part 2 of this series.
The net area calculation was slightly trickier. What I had to do was take each track and calculate how much of it lay within the area polyline whose gross area I’d just calculated. To do this, I took each track polyline and compared each of its sections with each vector of the area polyline. If I found that the two lines intersected (using the Intersect function from Part 1 of this series), I knew that the track entered the area. I would then chop the track polyline at the intersection point and trace the track until it ran out or I hit another intersect. In the latter case, I chopped the track polyline at the intersection point and threw away the remainder of its polyline. Using the PolyLineLength function, I calculated the length of what was left of the track, multiplied it by the width of the line (either an arbitrary number or indicated in the track tag), and subtracted the result from the previous gross area.
There are three special cases you might want to handle:
1. The track polyline starts inside the area. You can use the Inside function with the first point of the track polyline to find these cases.
2. A single track crosses an area multiple times. To handle this, you must continue to process the track polyline after the first intersection has been found.
3. A track forms the boundary of a compartment or stand. With properly drawn drawings, this situation shouldn’t occur, because the Compartment/Stand boundary should have been drawn along the outside line of the track. With badly drawn maps, you might need some rules to cover this situation — perhaps considering that it lies inside the area but subtracting only 50 percent of the track’s area from the gross area, for example.
Four last functions
Let’s look at the functions I introduced in this article.
For PolyLineLength, all we have to do is walk through the polyline and accumulate the length of each of its vectors into a variable. The distance along each vector is calculated using the formula from your very first geometry lesson: “The square of the hypotenuse is equal to the sum of the squares of the other two sides”:
Public Function PolyLineLength(pPoly() As Point) As Double Dim dblLengthAccum As Double Dim dblXDiff As Double Dim dblYDiff As Double Dim varI As Variant For varI = LBound(pPoly) To UBound(pPoly) - 1 ' Get the lengths of the other two sides dblXDiff = pPoly(varI + 1).dblX - pPoly(varI).dblX dblYDiff = pPoly(varI + 1).dblY - pPoly(varI).dblY dblLengthAccum = dblLengthAccum + Sqr((dblXDiff * dblXDiff) + (dblYDiff * dblYDiff)) Next varI PolyLineLength = dblLengthAccum End Function
The PointToPointDistance function takes the differences between the X and Y coordinates of two points and also applies the “square root of the sum of the squares” calculation. Note that when using this type of math, there’s no need to use the ABS () function, because, as we’re multiplying the numbers by themselves, the result will always be positive:
Public Function PointToPointDistance (pPointA As Point, pPointB As Point) As Double ' Calculates the distance of pPointA from pPointB Dim dblXDiff As Double Dim dblYDiff As Double dblXDiff = pPointB.dblX - pPointA.dblX dblYDiff = pPointB.dblY - pPointA.dblY PointToPointDistance = Sqr((dblXDiff * dblXDiff) + (dblYDiff * dblYDiff)) End Function
The PointToLineDistance function, however, is much trickier! Rather than describe it in words, I suggest that those who are interested just study the code. I’ve commented it in places to help you along:
Public Function PointToLineDistance (pP As Point, pL As LineXY) As Double ' Calculates the distance of pP from pL Dim dblL As Double 'Length of Line Dim dblR As Double Dim dblS As Double Dim dblXI As Double 'X co-ord of intersect of point perpendicular to line Dim dblYI As Double 'Y co-ord of intersect of point perpendicular to line Dim dblXDiff As Double 'Length along the X axis Dim dblYDiff As Double 'Length along the Y axis dblL = Sqr((pL.dblEnd_X - pL.dblStart_X) * _ (pL.dblEnd_X - pL.dblStart_X) + _ (pL.dblEnd_Y - pL.dblStart_Y) * _ (pL.dblEnd_Y - pL.dblStart_Y)) dblR = ((pL.dblStart_Y - pP.dblY) * _ (pL.dblStart_Y - pL.dblEnd_Y) - _ (pL.dblStart_X - pP.dblX) * _ (pL.dblEnd_X - pL.dblStart_X))/(dblL * dblL) dblS = ((pL.dblStart_Y - pP.dblY) * _ (pL.dblEnd_X - pL.dblStart_X) - _ (pL.dblStart_X - pP.dblX) * _ (pL.dblEnd_Y - pL.dblStart_Y))/(dblL * dblL) dblXI = pL.dblStart_X + dblR * _ (pL.dblEnd_X - pL.dblStart_X) dblYI = pL.dblStart_Y + dblR * _ (pL.dblEnd_Y - pL.dblStart_Y) ' Now we have the intersect point. ' Find out if the intersect point lies between ' the start and the end of the line. If dblR < 0 Then ' The intersect point lies before the start of line dblXDiff = pP.dblX - pL.dblStart_X dblYDiff = pP.dblY - pL.dblStart_Y ElseIf dblR > 1 Then ' The intersect point lies after end of line dblXDiff = pP.dblX - pL.dblEnd_X dblYDiff = pP.dblY - pL.dblEnd_Y Else ' Intersect point is between start and end of line dblXDiff = pP.dblX - dblXI dblYDiff = pP.dblY - dblYI End If ' Now calculating the distance PointToLineDistance = Sqr((dblXDiff * dblXDiff) _ + (dblYDiff * dblYDiff)) End Function
The LengthInside function is the final function of the series — but it’s a corker! This function calculates the amount of a polyline that lies within an area. The line might consist of more than one vector, so we must calculate the amount of each vector that lies within the area and then accumulate them together. Although the math isn’t especially difficult, the logic flow is fairly complex, since there are a number of different situations that have to be catered to. For example, any individual vector of the polyline can start either inside or outside the area and then can end either inside or outside the area. You can see it in Listing 1.
Listing 1. The LengthInside function.
Public Function LengthInside _ (pLine() As Point, pArea() As Point) As Double ' Calculates the length of pLine inside pArea Dim varL As Variant ' Loop counter Dim varA As Variant ' Loop counter Dim LVector As LineXY ' Current vector of pLine Dim AVector As LineXY ' Current vector of pArea Dim dblXIntersect As Double ' Intersection point Dim dblYIntersect As Double ' Ditto Dim intNoCrossings As Integer ' Self evident Dim dblStartX As Double ' Start point of interior line Dim dblStartY As Double ' ditto Dim dblEndX As Double ' End point of interior line Dim dblEndY As Double ' ditto Dim dblLIAccum As Double ' Acc. length inside area For varL = LBound(pLine) To UBound(pLine) - 1 'Cycle through vectors comprising length polyline LVector.dblStart_X = pLine(varL).dblX LVector.dblStart_Y = pLine(varL).dblY LVector.dblEnd_X = pLine(varL + 1).dblX LVector.dblEnd_Y = pLine(varL + 1).dblY If Inside(pLine(varL), intNoCrossings, _ pArea()) Then 'start of line is inside of the area. 'Calculate where/if it leaves the area intNoCrossings = 0 For varA = LBound(pArea) To _ UBound(pArea) - 1 AVector.dblStart_X = pArea(varA).dblX AVector.dblStart_Y = pArea(varA).dblY AVector.dblEnd_X = _ pArea(varA + 1).dblX AVector.dblEnd_Y = _ pArea(varA + 1).dblY If Intersect(LVector, AVector, _ dblXIntersect, dblYIntersect) Then dblEndX = dblXIntersect dblEndY = dblYIntersect intNoCrossings = intNoCrossings+1 End If Next varA If intNoCrossings = 0 Then 'Other end is also inside dblLIAccum = dblLIAccum + _ Sqr((LVector.dblEnd_X - _ LVector.dblStart_X) * _ (LVector.dblEnd_X - _ LVector.dblStart_X) + _ (LVector.dblEnd_Y - _ LVector.dblStart_Y) * _ (LVector.dblEnd_Y - _ LVector.dblStart_Y)) Else 'Other end is outside dblLIAccum = dblLIAccum + _ Sqr((dblEndX - _ LVector.dblStart_X) * _ (dblEndX - LVector.dblStart_X) + _ (dblEndY - LVector.dblStart_Y) * _ (dblEndY - LVector.dblStart_Y)) End If Else 'The start point is outside the area intNoCrossings = 0 For varA = LBound(pArea) To _ UBound(pArea) - 1 AVector.dblStart_X = pArea(varA).dblX AVector.dblStart_Y = pArea(varA).dblY AVector.dblEnd_X = _ pArea(varA + 1).dblX AVector.dblEnd_Y = _ pArea(varA + 1).dblY If Intersect(LVector, AVector, _ dblXIntersect, dblYIntersect) Then intNoCrossings = intNoCrossings+1 If intNoCrossings = 1 Then 'First intersecton dblStartX = dblXIntersect dblStartY = dblYIntersect Else dblEndX = dblXIntersect dblEndY = dblYIntersect End If End If Next varA If intNoCrossings = 0 Then 'LVector does not cross the area ElseIf intNoCrossings = 1 Then 'End point inside the area, ' start point outside. dblLIAccum = dblLIAccum + _ Sqr((LVector.dblEnd_X - dblStartX) * _ (LVector.dblEnd_X - dblStartX) + _ (LVector.dblEnd_Y - dblStartY) * _ (LVector.dblEnd_Y - dblStartY)) Else 'Both start, end points lie outside dblLIAccum = dblLIAccum + _ Sqr((dblEndX - dblStartX) * _ (dblEndX - dblStartX) + _ (dblEndY - dblStartY) * _ (dblEndY - dblStartY)) End If End If Next varL LengthInside = dblLIAccum End Function
There are a number of changes that can be made to the algorithm to improve performance (such as breaking out of loops when a match has been found). However, I haven’t implemented these because they make the code harder to follow. The logic also includes an assumption that any single vector doesn’t cross the area more than once, which is correct as far as the forestry example we’re using goes.
End of space
Although I’ve primarily been concentrating on geographical applications, many of the principles can be applied to other areas, such as the design of a custom user interface.
One final warning: If you’re dealing with large-scale objects, such as hydrographic survey data, you can’t just use straight X,Y coordinates. In those situations, you must apply spheroidal projections that take the curvature of the earth into account (and if you don’t like math, steer well clear!).
As I said in Part 1 of this series, I’m a professional software/Access developer, not a mathematician. There might be alternative — and possibly better — ways to process some of these functions. Last week, for instance, I came across a new method for solving the “line intersect” problem using parametric equations. The article claimed that this was a better method than the standard slope-intercept technique, since there was no need to take account of special cases for horizontal and vertical lines. I haven’t had to verify this, but I must admit to looking forward to doing so.
This is the end my brief series of articles on the use of geographical data with Access. Download file SPACE3.ZIP has all of the functions discussed in this article. Armed with these functions, you’re ready to tackle any space that you might have
David Saville is a partner in Aldex Software (www.aldex.co.uk), a UK-based software house specializing in developing solutions using Microsoft Access.
Sidebar: Digital Mapping
Most GIS systems rely on a base map, which is subsequently overlaid with application-specific information. This base map can either be created by scanning or by digitizing.
Scanning sounds like the easier option: Simply place your paper map on a scanner and, a few minutes later, voila, you have a base map. However, the result will be a raster image (a series of independent dots), which must be converted into a vector representation by some raster-to-vector-conversion software. While this process works for clean line drawings on white paper, with a map, the result is often unsatisfactory. A typical conversion problem occurs when a dashed line is converted into a series of short, unconnected lines rather than a single entity.
Consequently, most base maps are produced by digitizing, a process which produces vector drawing naturally. The operator uses some pointing device to indicate a spot on the map and then records that spot in the CAD database. The operator must repeat this process for every relevant point on the map — an error-prone process (the accompanying article discusses the kinds of problems that this creates and how to handle them). Some of the more sophisticated CAD packages can overlay both raster and vector data on the screen simultaneously, which gets around the scanning problem.
The application-specific information is almost always applied by a human operator. The problem at this point lies in associating the higher-level data type (for example, a telephone line) with the basic data type (in this case, a line). You can use the CAD layer and line type attributes to do this automatically (if it’s a dashed line on layer 22, then it must be a telephone line) but this can become rather rigid and unwieldy if you have large numbers of application-specific data types.
Another, more flexible method is to use tags. A tag can be just a simple text string with a unique tag identifier followed by the type of data it represents (for example @TL where @ indicates a tag and TL a telephone line). The tag is placed on the drawing adjacent to the item it relates to. The only drawback with this method is the more complex processing that’s required to link a tag with a particular entity (see the code in the accompanying article, for instance).