Over the course of 2 years, I wrote from scratch a lightweight web application which allows a user to search a street network for an address, and then displays that address using a web mapping service.
PHP, HTML5, CSS, PostgreSQL, PostGIS
(Earlier versions of the project used XHTML instead of HTML5)
This project began while trying to plan bus trips in the Seattle area. Both King County Metro and Sound Transit's websites contain trip planning applications, which accept almost any input as the desired endpoints. As a person who is interested in programming and GIS applications, I was curious about how it worked. In order to find the nearest bus stop to the user, the software would have to figure out where the user was. How did it know? I had to design my own system to find out.
After considering a number of different input strings, I theorized that there are 3 main types of location input. First is the address: a house number and a street (example: 4700 University Way NE). This type is probably indicative of home users, who are quite familiar with their own addresses. Second is the intersection, a pair of streets (example: 4th & Jackson). This type is probably more popular with urban riders, who know corners, but might not be sure of their current address. It can also be useful for pinpointing a stop of known location. The third and final type is the landmark (example: Space Needle). This type is probably popular with tourists, but can also be useful to local residents who don't know the destination neighborhood as well. It should be noted that in all 3 cases, the amount of input the system will accept is variable. Consider the intersection example above, 4th and Jackson. There are 3 streets in Seattle with the name Jackson, and 37 with the name 4th. However, only 4th Ave S and S Jackson St intersect. So in some cases, it isn't necessary to gather more information than the name of the street.
The timeline of the project has been approximately 18 months, from spring 2010 to autumn, 2011. However, that measure is not indicative of actual coding time. I believe a project like this could be completed in less than a week by a dedicated and knowledgeable person. I, however, was working in my spare time, using programming languages and techniques I wasn't previously familiar with. Additionally, the project was initially formed on a home server running on a Macintosh, and maintaining up-to-date installations of Apache, PHP, and PostgreSQL was problematic. Eventually, the entire operation was moved to a Linux computer, where the work progressed more rapidly. The first version was published online in November of 2010. Subsequent revisions generally required no more than a few days worth of work, and were published sporadically. At this time (November, 2011), although the code is extremely modular and extensible, I feel that the project has officially been completed.
The first stage of the project was to sketch out a general methodology. I decided very early on that I wanted to tackle each of the 3 different types of location searching mentioned above separately, simply because each would require a different set of programming concepts. For each method, it would be necessary to collect the input, analyze the input, and calculate the location. Originally, I had also planned on making a custom Google Maps element to go with each result, but that proved to be beyond the scope of the project.
I also chose the technologies at the very beginning. I knew that I wanted to try out PostGIS, and I found from reading that it contained the functionality I expected to need. One resource that was especially enlightening was Tobin Bradley's "Finding Street Intersections in PostGIS". From there I bought a book* which covered the basics of both PHP and PostgreSQL, and began typing.
In order to use PostGIS, I had to construct a PostgreSQL database. PostGIS includes a command-line tool to upload an ESRI shapefile into the database, and learning how to use this tool was very straightforward. Four main datasets were used: King County street network, King County landmarks, Honolulu street network, and Honolulu parks. It should be noted that this data is being used for demonstration purposes only - this is not intended to be used as a commercial tool. The datasets are the property of their originators (King County, WA and City of Honolulu, HI), and are being used in good faith. There are a couple extant errors (typos) in the datasets, which I have not corrected.
The first programming problem I tackled was the street intersection search, because it fascinated me more than the others. As mentioned above, this type of search is usually performed by using an ampersand to construct the query. Programming a routine to parse this kind of phrasing is not trivial, but I wanted to focus on the street intersection, not the robust decryption of user input. Therefore, I designed a user interface that forced the user to separate the two streets. I also used dropdown menus to prevent the user from using unnecessary punctuation and unusual abbreviations. Significantly, the input interface also made it possible to ignore certain parts of the street's name.
The street intersection application takes place in two stages - processing and analysis. The first stage searches the database for each street, and returns all the possible options. These options are presented as a list, allowing the user to decide which pair of streets was meant. If the streets can be matched exactly, with no ambiguity, the first result page merely lists the two streets. The second stage searches the database for the intersection. For the purposes of this exercise, an intersection is an endpoint of a street network segment which is co-located with an endpoint of another street network segment. The PostGIS intersection results are given as X,Y coordinates, which are then converted to latitude and longitude. Once the latitude and longitude are known, a web mapping URL can be constructed and loaded.
The address search uses many of the same techniques as the intersection search. The only real difference is that the address search is actually forced to parse variable user input. For demonstration purposes, spelling and punctuation errors would not be noticed or corrected - a search for 'Fouurth Ave.' would be allowed, but it would not produce the intended results. However the address input does allow the user to leave off the prefixes. Writing the parsing function that recognized prefixes and street types while allowing for multiple-word street names proved to be the most difficult part of the project.
The other unique part of the address search is how the results are calculated. First, the address ranges of each street segment are checked against the given house number until the correct street segment has been located. Next, the address ranges are checked to determine which side of the street the address is on. Once the correct side is found, it is possible to calculate a ratio representing how far up the street the property theoretically lies. For example, if a street segment represents a range from 2300 to 2398, then number 2350 is approximately halfway up the street. This ratio can then be plugged into the PostGIS line_interpolate_point function, which measures that ratio up the street segment. The precision of this method can be easily tested, because Google Maps is able to give the actual address of the resulting location. Generally, the precision depends on the size of addressed parcels along the given street segment - a small-lot home address will be much more accurate than a large-lot commercial address. However, in most areas, pinpointing the correct street segment is sufficient for finding nearby bus stops.
The landmark search was the easiest to parse. As long as the name given by the user was in one of acceptable name fields, the text representation of the location could be pulled directly from the point database and passed to the latitude/longitude converter.
Originally, the static pages of this project were written to conform to the XHTML 1.1 standard, the most extensible web standard in use at the time. Since then, HTML5 has become the primary standard. As of version 1.3, all user-accessible pages are written in HTML5 and styled using CSS. Given the simple nature of the user interface, it should be usable on multiple devices.
Before this project, I had never written any PHP from scratch, so I must be excused for preferring to use simple statements wherever possible. I tended to avoid such constructions as ternary operators and associative arrays, because they were unfamiliar to me. The code uses what some programmers refer to as the One True Brace Style for indenting and bracketing, which makes ample use of braces to clarify structure. The code is also thoroughly commented so that it can be easily maintained. Wherever possible, functions were used to perform repetitive operations.
Originally, this project was limited to the streets and landmarks of King County, Washington. Later, the ability to search the streets and parks of Honolulu was added. Since the same processing methods are applicable to any dataset, I made it easy to add another area. All location-specific data is stored in an include file called localdata.php. Location-specific data includes the projection of the data, the names of certain fields in the databases, and user notes containing any special instructions for the dataset. Adding Denver streets and landmarks to the project would take about 30 minutes, and require the following steps:
This setup works by passing the name of the area chosen from step to step during processing, and having the localdata.php file available along the way. No database processing is necessary because the original names of the relevant geocoding fields are stored in localdata.php as variables.
In my experience, the local governments that produce street databases work exclusively with the relevant local State Plane coordinate system. As such, only State Plane coordinate systems are supported. The King County database included in this demo uses Washington North (EPSG 2926), and Honolulu uses Hawaii Zone 3 (EPSG 2784). Consequently, those are the only coordinate systems that function in this demo. To add more State Plane coordinate systems to the demo, modify the projectiondata.php file with the projection definition.
There are two types of State Plane coordinate sytems - those that use a Lambert Conformal Conic projection, and those that use a Transverse Mercator projection. In this demo, King County is an example of the former, while Honolulu is an example of the latter. The mathematics used to convert from Lambert coordinates to latitude and longitude are quite different from those involved in a Mercator conversion. Therefore it was necessary to include two different blocks of conversion formulae in latlon_convert.php. The formulae were adapted from "Map Projections - A Working Manual" by John P. Snyder (USGS Professional Paper 1395, © 1926 & 1987), and formula numbers corresponding to the paper are provided in the comments.
The initial page finder.htm presents two options. The first is which city to search, and the second is what type of search to perform (address, intersection, or landmark). Submitting the form calls start_process.php, which builds the code for the next user interface page. An include file called constructionfunctions.php contains all the information required to build the requested search page. The search page also prompts the user to choose which mapping service to use. Submission of a search page launches the required process page (address_process.php, intersect_process.php, or landmark_process.php), which builds a confirmation page. When the confirmation page is submitted from either an address search or an intersect search, the required find page is launched (address_find.php, intersect_find.php), which finalizes the results and loads the web mapping service. In the case of a landmark search, confirmation loads the web mapping service directly. Flow control is illustrated in Figure 1 (included PHP documents are not shown). Note that there is only one static user interface page - all other user interface pages are generated in PHP.
As Figure 1 shows, this project uses 4 user interface pages (3 auto-generated) and 5 processing pages. Most transit trip planners do the same amount of work in fewer pages, and the user input is relegated to a tiny block in a larger home page. Clearly, my 9 pages lack the programming complexity of a working enterprise application. Much of this can be explained by my early decision to break up out the different programming problems. My non-compact and simplistic code could also be blamed. However, it is possible that enterprise programmers have access to tools I do not have.
A perfect user would enter complete and correct addresses every time, saving tons of work. It seems like most of the code I produced was designed to anticipate and deal with user misuse. And yet I haven't even scratched the surface of what kind of garbage the human hand can type. One could literally spend a hundred years trying to anticipate, sanitize, normalize, and second-guess every possible error and still not have perfect code. It's important to set an input standard in advance, and refuse to deal with data problems below a certain threshold. It seems the best way to do this is to steer users away from their own habits by limiting their data entry opportunities and offering plenty of constructive hints. Sound Transit accomplishes this by offering a helpful dialogue with plenty of examples. As the old saying goes, "Make something foolproof, and only a fool will want to use it." It's important to not get lost in the potential foolishness of one's users.
One thing that cannot be overstated is the impressiveness of PostGIS. Although it is difficult to get started with, the possibilities are endless. Given a sufficiently robust framework, this technology should be able to accomplish anything , including cartography and on-the-fly analysis. The idea of loading shapefiles into a web accessible database was genius. Best of all, the technology is free open source software, which can be deployed and modified without any usage fees or costs. Of course it's not as easy to use as some of the web-enabled enterprise products, like ArcGIS Server. And it doesn't come with the help and maintenance support of a licensed product. But ambitious programmers and entrepreneurial startups can use it to enter the web geo-processing arena with very low capital costs, and that broad participation benefits everyone.
As a standalone application, I would consider this project completed and closed. However, the potential for integrating such an application into a multi-function tool is nearly limitless. It's possible that the code foundation I've built here will show up again in later projects.
For those that aren't familiar with Seattle or Honolulu, the following data can be used to test this application.
King County:
Honolulu:
Vance Julien, Tobin Bradley, Nicolas Eckhardt, Philip Zoch, strk@postgis-users
* Beginning PHP and PostgreSQL 8, by W. Jason Gilmore & Robert H. Treat, Apress, 2006