notes on GPS cumulants
I have carried one or another handheld GPS unit on most of my hikes since 2000. One object of their use has been to compute cumulative mileage and ascent for each day of hiking. Why is that hard?
latitude & longitude
Successive units have increased in accuracy, and most latitude and longitude measurements are now accurate to within a foot or two. Every now and then, the estimated latitude/longitude will be far off, when the satellite signals bounce off a manmade or natural wall, so most GPS units silently eliminate new coordinates that are ridiculously different from those obtained less than a minute or so ago.
The high accuracy of the retained coordinates means that plotted tracks always look sensible when overlaid on a map. How could a foot or two of random errors per coordinate make any difference?
When trying to accumulate the day's total mileage, those little errors start to add up. To see this, imagine a 1-km straight walk, west to east along a parallel of latitude, and suppose that the GPS provided correct longitudes every meter of the way, but read latitudes that varied in the pattern 1m off to the north, 1m off to the south, 1m off to the north, 1m off to the south, and so on for 1000 locations. The length of this zigzag path is about 2.36 km.
Most handheld GPS units can estimate a track's total mileage, and they try to avoid silly results like this. The simplest technique is to do geometric smoothing, replacing each raw coordinate pair by a computed new point, somehow less victimized by random noise.
Smoothing a series of data points consists of computing an expected value for each point and then replacing the observed value by some combination of the observed and expected values. For example, suppose that observed series, collected at equally spaced intervals, were altitudes (or latitudes, or longitudes, or anything else) of
(a) 10, 20, 100, 40, 50, 60
This looks like a linear series, with some sort of measurement error in the third element. What you probably expected to see there was 30. Should that 100 just be discarded, and 30 be used instead? But what if the series had been more realistic-looking:
(b) 9.9, 20.2, 33, 39.8, 50.3, 60.4
There's no reason to be confident that the underlying reality is a straight line; Maybe the 33 here is correct, or at least no more erroneous than all the other terms.
There are various fancy ways of computing the expected value of a term inside a series, using the values of one or more of the other terms. Each method constructs some sort of line through one or more of the other points and then sees where the line is at the time of interest. In some situations the appropriate line might be (say), a polynomial curve passing through a few points before and after the point of interest. That kind of computation would be silly here, where the true position of a person on foot is never changing very rapidly from one GPS measurement to another a few seconds later. My software computes the expected value of a point within a series by simply using a straight line between the previous and subsequent points.
Then there is no unique way to combine the expected and observed values. Suppose that the observed data series was
(c) 10, 20, 40, 40, 50, 60
The expected third term is again 30, but the observed 40 might actually be correct. And remember: random error is present on every reading. The true (and unknowable) values might be
(d) 10, 25, 40, 55, 50, 60
The "expected" values from (c) would be
(e) 10, 25, 30, 45, 50, 60
That is, they'd be closer to the true values than those of (c) in some places, but further away in others.
What my software does in each smoothing pass is to leave the endpoints as is, but to replace each internal value by the average of its observed and expected values. For example, my software would smooth (c) to
(f) 10, 22.5, 35, 42.5, 50, 60
after one pass; another pass would give expected values of
(g) 10, 22.5, 32.5, 42.5, 51.25, 60
and smoothed values of
(h) 10, 22.5, 33.75, 42.5, 50.625, 60
There's no way to determine the optimal number of passes. If a hike begins and ends at the same place, then smoothing can, if enough passes are taken, reduce its total mileage to zero.
Once one has two trusted coordinate pairs, it's fairly straightforward to compute the distance between those points on the earth's surface. The calculation is simple if one assumes that the earth is a sphere (see https://keisan.casio.com/exec/system/1224587128 (accessed 2021-06-20)), less so if one refuses to make that assumption (see the Vincenty calculation at https://www.movable-type.co.uk/scripts/latlong-vincenty.html (accessed 2021-06-21)).
The altitude calculations are more fraught. From geometric considerations, the altitudes estimated by GPS receivers are much less accurate than latitude & longitude fixes. These altitudes may be off by hundreds of meters.
Many modern handheld GPS units contain viewable topo maps, and you might think that the devices could get their altitudes from the maps. That is, if the device knows that it is at N 49° 20' 42" W 123° 14' 4", it should be able to look at the map and know that the altitude is 170 meters, perhaps throwing in another meter or two for the presumed distance between the unit and the ground. No GPS units I've ever seen do that. Their maps are just for human eyes.
Instead, most good GPS receivers now get their altitudes from contained aneroid barometers. These have problems of their own — weather changes can change a reported barometric "altitude" by a few hundred meters within a few hours, but one can compensate for this problem by resetting the barometer every so often, so that it shows the correct altitude when it is somewhere where the correct altitude is known.
Also, aneroid barometers are "temperature-corrected," but this means only that their readings don't change when the local temperature changes. When a barometer tries to turn pressures into altitudes, it must make some assumptions about how the air thins with altitude, and that depends on the way in which the air gets colder above the barometer's position. Many barometric altimeters assume a conventional temperature/altitude curve that is typical for the North Atlantic, but too cold for most ground-based travel. The result is typically an underestimate of a change in altitude.
The best way to use a GPS receiver to estimate an altitude seems to be to use the GPS unit for a coordinate pair (probably smoothed), and then to compute the altitude by using available databases that contain radar-measured altitudes for all the earth. These data are supplied in (typically) 1-degree squares, within each of which altitudes are supplied for a grid of 10-meter or 30-meter squares. A square degree in the middle latitudes thus typically contains about 117 million values. See https://www.gpsvisualizer.com/elevation (accessed 2021-06-21) for a guide to the available databases.
Because these databases are so bulky (about 457 MB for a 1-degree square of a 10-meter middle-latitude grid; about 51 MB for a 30-meter grid), I have downloaded only the data for the 3 square degrees nearest my home.
Once altitudes for a track have been determined, one must again look for silly outliers. For example, minor horizontal errors might take one into a hole, or to the top of a spire, that one had not actually visited.
what I do
My procedure now makes use of a program that I originally wrote to keep track of hikes I have taken, whom I hiked with, and so on. When I come back from a hike, I use Garmin's BaseCamp program (see https://www.garmin.com/en-CA/software/basecamp/ (accessed 2021-06-20)) to download the track from my GPS unit (currently a Garmin GPSMAP 65s). Then I export the track and waypoints to a file in the standard (HTML-like) .gpx format. My software reads these data into a database.
The first few records from the GPS unit are usually nonsense, recorded before it had adapted to the location at which I had turned it on that day. I discard those records, and I discard all of the GPS-supplied altitudes in favor of altitudes from the altitude-grid databases described above.
The database-derived altitudes will be misleading, if the GPS was not being carried at a more-or-less constant distance from the ground. For example, if I carry the GPS across a high, flat bridge, then the database-derived path will have me climbing down into the abyss and up the other side. The only fix for this is to remember times of entering and exiting bridges, and then to delete GPS points recorded between those times.
After a single pass of coordinate smoothing (correcting the altitudes when I change any coordinates), I look at the computed Vincenty distances between successive points. If some of those distances are implausibly large, I smooth repeatedly until they appear to be more reasonable.
Then, I look at the altitudes, again smoothing until implausible sudden steps up or down are eliminated. As a reality check, I can sometimes look at the altitudes shown during a time when I knew I was stationary (say, at lunch): if the GPS' false report of motion during that time is still evident, I smooth the altitudes until they appear stable.
Finally, I use the remaining data to compute cumulative mileage and ascent.
Page revised: 2022-09-28 22:23