In Part 1 of this series (see ” Accessing Spatial Data, Part 1” in the December 1997 issue), David provided the routines for determining whether two lines intersect. In Part 2, he completes the spatial data toolset with the functions for area.
In last month’s article, I showed you how to use Access with geographical data. I then went on to describe a Line Intersection function as one of the primary tools for working with spatial data. In this month’s article, I’ll spend a little time looking at some more data represent-ations before going on to two further functions: one to calculate areas, and one to determine whether or not a point lies within a defined area.
More data representations
Previously, I described methods for storing point and linear data within an Access table. I also described the VBA representation for a simple line (one that consists of just one vector or element). Now I’ll look at VBA code representations for single points and polylines.
To process geographical data, you could use the OpenRecordset method and process the data in the recordset directly. Depending on your application, this might be a good way to proceed. However, it can lead to clumsy coding that’s harder to maintain, so you might prefer to use a user-defined type. The representation for a single point or location requires an X and a Y coordinate, so we can define it as:
Type Point ' User Defined Point dblX As Double dblY As Double End Type
Doubles are used because we’re dealing in real-world data. If you were creating a GUI-type application, you’d probably use integers.
Next, a representation for a polyline is required. In my previous article, I defined a polyline as an entity consisting of more than one linked vector or element (that is, a set of connected straight lines where the end of one line is the start of the next line). Most geographical features will either be represented by points or by polylines. Curves aren’t so easily dealt with, and, for most practical applications, a curved or irregular feature (such as a coastline) will generally be represented by a polyline. Note that there might well be a large numbers of vectors in such a representation. As the end of one line is the same as the start of the next line, we can represent a polyline by an array of points, like this:
Dim Polyline(100) As Point
That Dim statement will create an array of 100 points that can be used for a polyline with 99 vectors (one less than the number of elements in the array because we need a starting point). Just before we leave polylines, it’s worth noting that a closed area — for example, a lake (ignoring any pedants who mention rivers flowing into and out of the lake) — can simply be represented by a polyline whose start and end points are the same.
Now, for those readers whose eyes glaze over at the mere thought of a mathematical equation, I promise that this function will be much easier than last month’s. No equations in this function — I’ve already covered them, so it’s safe to keep reading!
A common requirement when processing geographical or spatial data is to find out if a given point is inside or outside of a defined area. This is easy for humans to determine visually, but it’s much harder for a computer. Typical questions might be: “Which lampposts lie inside our area of responsibility?” or “Is that mouse pointer over the star-shaped button?”
You have two objects to deal with here. The first is a closed polyline used to define the area (don’t forget to add error-checking to ensure that it really is closed), and the second is a point used to define the location of the item to be tested. If you’re using a rectangular area, the solution is easy — simply look at the minimum and maximum X and Y values for the area and see if the point lies between them.
Initially, you might think it best to use something similar for a more general function, and, indeed, the rectangle check is a good filtering technique to throw out any points that clearly lie outside of the maximum extent of the area. However, it isn’t so good for more complex areas (oh dear, now I’ve gone and done it — no doubt somebody will come up with a beautiful and elegant function just to disprove me!). The method that I use is to draw an imaginary straight line starting well outside of the area and ending at the point being tested. To get a point “well outside,” I either use a minmax function (like the one described previously that checks for a point being outside a rectangle), or I just use a coordinate that I know is beyond the limits of my data. I then use the Line Intersect function, described last month, to check this imaginary line against each vector of the polyline. During this check process, I count the number of times that I get an intersection. Initially, there seem to be only three possibilities:
- No intersections: The point must be outside of the area.
- One intersection: The point must be inside the area.
- Two intersections: The point must lie beyond the area (because we’ve crossed into, and then out of, the area).
This simple interpretation works fine for most areas. However, for highly convoluted areas where the boundary twists back on itself, we might get more than two intersections. This sounds complicated, but if you think it through, not much changes. If the number of intersections is even, then the point lies outside of the area; if the number of intersections is odd, then the point lies inside. Rather than use an If/Then/Else construct, which could never cover all possibilities, it’s much easier and quicker to use the Mod function, which returns the remainder of one number divided by another (I can see a few glazed eyes — hang in there, I’m nearly done). If we find that the modulus of the number of intersections divided by 2 is 0, then the number of intersections is even (and odd when the modulus is 1).
However, there’s one final problem with this algorithm. What if the imaginary line crosses the polyline exactly on a node? That is, the line crosses on top of one of the points used to define the polyline. In this situation, the Line Intersect function will always give an even number of crossings (either 0 or 2, depending on whether a line crossing exactly over an end point gives an intersect or not). One solution would be to use several lines from different imaginary start points that all converge on the test point, and then test the number of intersects for each line. If even one of them is odd, then the point lies inside the area. This is somewhat clumsy and inelegant, and there’s always the remote possibility that each imaginary line crosses at a (different) node.
The alternative that I prefer is to modify the InBetween function (which is called by the Intersect function and was dealt with last month). Tweaking this function slightly changes it so that we can use it to test for this situation. The new InBetween function looks like this:
If pdbl >= (pdblStart + dblSmidgin) And _ pdbl <= (pdblEnd + dblSmidgin) Then InBetween = True Exit Function ElseIf pdbl <= (pdblStart - dblSmidgin) _ And pdbl >= (pdblEnd - dblSmidgin) Then InBetween = True Exit Function Else 'Otherwise fails InBetween = False End If
The routine works by shortening the line very slightly (by a smidgin!) through moving the start point towards the end point. If the point we’re testing is no longer on the line, then it must have been on the start or end point originally.
Another possibility is to set up the imaginary line so that it’s horizontal. If you do that, then you just need to test the Y coordinates of the nodes to see if they’re the same as the Y coordinate of the test point (after allowing for the precision of double numbers). In fact, if perform-ance is important, you can use a horizontal line for every test, which would also allow you to simplify (and speed up) the Line Intersect function.
No matter which way you go, if you determine that the line crosses an end point, then you should reduce the number of crossings by one before applying the Mod test.
There, now that wasn’t too bad, was it? No equations and only a smattering of math. You can see the routine itself in Listing 1.
Listing 1. The Inside function.
Public Function Inside(pTestPoint As Point, _ pPoly() As Point) ' Copyright 1997, Aldex Software ' NB calls the function Intersect ' Author D.P.Saville, Aldex Software, ' email@example.com, http://www.aldex.co.uk/ ' EXAMPLE USEAGE: See function TestInside() ' Dim varI As Variant Dim ImaginaryLine As LineXY Dim TestLine As LineXY Dim intCrossings As Integer 'For testing purposes 'For varI = LBound(pPoly) To UBound(pPoly) ' Debug.Print pPoly(varI).dblX, pPoly(varI).dblY 'Next 'You could place a simple boundscheck here to see if the point 'to be tested lies outside of the min/max co-ords of the polyline; if it does 'then it obviously lies outside the shape. 'Now we create an imaginary line that starts well outside of the polygon 'You need to either know the data values being used or find the min/max range and set the start point outside of that.<**> ImaginaryLine.dblStart_X = -500 ImaginaryLine.dblStart_Y = -500 ImaginaryLine.dblEnd_X = pTestPoint.dblX ImaginaryLine.dblEnd_Y = pTestPoint.dblY 'Now count the times that the imaginary line crosses the lines that make up the pPoly() array. intCrossings = 0 ' (old habits!) For varI = LBound(pPoly) To UBound(pPoly) - 1 TestLine.dblStart_X = pPoly(varI).dblX TestLine.dblStart_Y = pPoly(varI).dblY TestLine.dblEnd_X = pPoly(varI + 1).dblX TestLine.dblEnd_Y = pPoly(varI + 1).dblY If Intersect(ImaginaryLine, TestLine) Then _ intCrossings = intCrossings + 1 End if Next ' Debug.Print "No. crossings = " & intCrossings 'Use Mod to see if intCrossings is odd or even If intCrossings Mod 2 = 0 Then Inside = False Else Inside = True End If End Function
The area function
The second function I’m going to describe is one for calculating the area of a polygon. Area calculations are used extensively in Geographical Information Systems and mapping type applications. “How much turf do we have to cut?”, “What’s the total expected yield for that area?”, “How many mature trees are there in that forest?” are all questions that can only be answered by calculating the area.
The Area function uses a surprisingly simple algorithm that takes each vector or section of the polyline in turn and calculates the area that lies underneath it. Each of these intermediate areas is then added into a running total. The direction of each vector (whether its start is greater or less than its end) causes the intermediate areas to be either added to or subtracted from the running total.
The algorithm is similar to the integration of the area under a curve. Don’t panic, though — the math is much simpler, as we’re dealing with a set of interconnected straight lines rather than a true curve. In essence, we’re calculating the rectangular area below the start coordinate of the line and adding to this the triangular area from the start to the end of the line. In the following algorithm, these have been consolidated into this single expression (the only significant math in this month’s article — and pretty simple at that):
WorkArea = (Start_X - End_X) * _ ((Start-Y - End_Y )/ 2) + Start_Y)
And now for some caveats (you always get these when dealing with this type of real-world data). The polygon must be closed, and you should test for this by ensuring that the first element of the polyline is identical to the last. One way to ensure this is to assume a final segment of the polygon that runs from the last point in the array to the first point. If the polygon was already closed, you’ll simply generate a line of zero length.
The polyline also has to define a single, continuous area. While it can be complicated with indentations and protrusions, the component lines must not cross one another (as in a figure eight shape). If the quality of your original data is poor, it might be a good idea to test the polyline using the intersect function to ensure that none of its lines crosses any other.
The code for the Area routine appears in Listing 2.
Listing 2. The area function.
Public Function Area(pPoly() As Point) ' Calculates the area of the input polyline pPoly ' Copyright 1997, Aldex Software ' Author D.P.Saville, Aldex Software, ' firstname.lastname@example.org, http://www.aldex.co.uk/ ' EXAMPLE USEAGE: See function TestArea() ' Dim dblAreaAccum As Double ' Area accumulator Dim dblXDiff As Double Dim dblYDiff As Double Dim varI As Variant ' Cycle through the polyline For varI = LBound(pPoly) To UBound(pPoly) - 1 ' Get working values dblXDiff = pPoly(varI + 1).dblX - pPoly(varI).dblX dblYDiff = pPoly(varI + 1).dblY - pPoly(varI).dblY ' Calculate the area under the vector and add it to the Area Accumulator dblAreaAccum = dblAreaAccum + (dblXDiff * ((dblYDiff / 2) + pPoly(varI).dblY)) Next varI Area = Abs(dblAreaAccum) End Function
Let me repeat what I said in the previous article: Although these routines have been tested, you must test them again with your own data before using them. Be especially careful to test against horizontal and vertical lines, and to make sure that you use genuine data where areas haven’t been closed off and so on.
Listing 3 shows a couple of simple routines to test both of the functions described above. In download file SPACE2.ZIP, you’ll find a sample database containing all of the code from this article and routines for testing the functions.
Listing 3. Testing the functions.
Public Function TestArea() ' Tests the Area function Dim Poly(5) As Point Poly(0).dblX = 8 Poly(0).dblY = 2 Poly(1).dblX = 16 Poly(1).dblY = 6 Poly(2).dblX = 10 Poly(2).dblY = 8 Poly(3).dblX = 8 Poly(3).dblY = 15 Poly(4).dblX = 5 Poly(4).dblY = 11 Poly(5).dblX = 8 Poly(5).dblY = 2 Debug.Print "Area = " & Area(Poly()) ' Should give an area of 52.5 End Function Public Function TestInside() ' Tests the Inside function. Dim Poly(5) As Point Dim TestPoint As Point Poly(0).dblX = 8 Poly(0).dblY = 4 Poly(1).dblX = 16 Poly(1).dblY = 8 Poly(2).dblX = 10 Poly(2).dblY = 10 Poly(3).dblX = 8 Poly(3).dblY = 17 Poly(4).dblX = 5 Poly(4).dblY = 13 Poly(5).dblX = 8 Poly(5).dblY = 4 TestPoint.dblX = 10 TestPoint.dblY = 13 Debug.Print "Inside=" & Inside(TestPoint, Poly()) ' Should be False (with 2 crossings) TestPoint.dblX = 12 TestPoint.dblY = 9 Debug.Print "Inside=" & Inside(TestPoint, Poly()) ' Should be True (with 1 crossing) End Function
In Part 3 of this series, I’ll be looking at an application that uses these routines, as well as discussing more of the problems you might face when trying to process real-world data.