Ubuntu GIS from Scratch: Step 4 – Loading Data.

(Note – this post was updated on October 28 to include new information!)

Welcome to Step Four in our exploration of open source GIS using Ubuntu. In the previous installment, we spatially extended our PostgreSQL database server with PostGIS. Now we’re ready to load some geographical data into our database. PostGIS includes a tool called shp2pgsql which allows us to prepare a standard ESRI shapefile for upload. The first step is to find some data.

Warning: If you haven’t completed the previous steps, installing and configuring PostgreSQL and PostGIS, none of the following will work at all – you’ll get all kinds of crazy errors. So unless you have these programs installed and working on your machine already, you’ll need to go back and do that before proceeding.

1. The datasets I have chosen to use in this example describe urban features in Metro Portland, OR. The Metro GIS data portal is located at:
We’re interested in two datasets here. The first is collection of city boundary polygons called ‘City Limits (poly)’, which can be found on page 3. And the second is the street network, which is called ‘Streets’ and can be found on page 10. Download both of these files to your Downloads folder.

2. To unpack the data, open the Downloads folder and double-click streets.zip, which will open the zipfile in the Archive Manager. In the Archive Manager window, you will see that this zipfile contains 25 objects. When you click the Extract icon, a dialogue will appear, asking you where you want to store the contents. In order to make it easier to move the files around, you should create a folder called ‘streets’ before clicking the extract button. Do the same for city_fill.zip, creating a folder called ‘city_fill’ to store its contents. Next, create a new folder in your home folder called ‘GIS’ and drag the streets and city_fill folders into it.

(Note: sometimes you will only see one object in the Archive Manager window, a folder. In this case, you do not need to create a new folder before extracting.)

3. Uploading data into a PostGIS database requires you know what the SRID is – the spatial reference ID, a number corresponding to the projection of your data. This is not always easy, because many SRIDs are very similar, and the wrong one can lead to data errors. The first thing to do is to check the shapefiles themselves for projection data. The Portland streets dataset includes a file called streets.prj. If you open this file in a text editor, you will see the name of the projection on the first line:
Once you know the name, it’s fairly easy to find the right SRID. Go to the website:
and do a search for ‘oregon north’. There will be 37 results, and 2 of them match our projection exactly. Both of them are exactly the same (a very uncommon happenstance), so it doesn’t matter which one we use. For demonstration purposes, I’m going to arbitrarily choose ‘SR-ORG:6856’. Click the link for this projection, and then click the link to the ‘PostGIS spatial_ref_sys INSERT statement’. You’ll see a long line of text beginning with the word INSERT. Make a note of this location, because we will need this insert statement later.

4. Unfortunately, PostGIS doesn’t know about SRID 6856 yet. By default, it only comes with a limited number of SRIDs, and most of them correspond to projections defined by the EPSG, a European geodetic standards organizer. So we will have to add this SRID to the spatial_ref_sys table in template_postgis. This requires a trip to the terminal. The first command will unlock template_postgis, allowing us to connect with it:
psql -d postgres -c “UPDATE pg_database SET datallowconn=’true’ WHERE datname=’template_postgis’;”
The response will be:
Now we can connect:
psql -d template_postgis
As before, this will drop us into the PostgreSQL command shell, and the prompt will change. From here we can verify that SRID 6856 is not in the spatial_ref_sys table using an SQL select statement:
SELECT srid, auth_name, auth_srid FROM spatial_ref_sys WHERE srid = 6856;
(0 rows)
Since the select statement found 0 rows, SRID 6856 must not exist in the table.

At this point, recall the insert statement found on Spatial Reference in the previous step. If you know anything about SQL, you will see that the first number in this statement is the SRID – in this case, it is 96856. Spatial Reference likes to prefix a 9 to its SRIDs so that there is no chance of overwriting an SRID that is already in the table. This is essential because the SRID is the primary key of the spatial_ref_sys table, and must be unique. However, since we know that there is no 6856 in the table already, the extra 9 is just confusing. For the sake of convenience, we want to delete the 9 from the insert statement. Here’s the complete command, without the extra 9, which you can run now:
INSERT into spatial_ref_sys
(srid, auth_name, auth_srid, proj4text, srtext) values ( 6856, ‘sr-org’, 6856, ”,’PROJCS[“NAD_1983_HARN_StatePlane_Oregon_North_FIPS_3601_Feet_Intl”,
(Don’t worry that this command spans more than one line – the PostgreSQL shell doesn’t execute the command until it sees a semicolon at the end.)
Now you can check your work using the same select statement from above:
SELECT srid, auth_name, auth_srid FROM spatial_ref_sys WHERE srid = 6856;
(1 row)

EDIT: in order to use the built-in projection and transformation functions in PostGIS, we need to include the proj4 defintion for this projection in the database. The insert statement we copied from spatialreference.org does not include the proj4 definition, and I haven’t seen it anywhere online. Fortunately, I figured out how to make one myself. To add this information to spatial_ref_sys, give the following command:

UPDATE spatial_ref_sys SET proj4text=’+proj=lcc +lat_1=44.33333333333334 +lat_2=46.0 +lat_0=43.66666666666666 +lon_0=-120.5 +x_0=2500000 +y_0=0 +ellps=GRS80 +units=ft +no_defs’ WHERE srid=6856;

That’s done, so we can exit from the PostgreSQL command shell:
And lock up template_postgis again:
psql -d postgres -c “UPDATE pg_database SET datallowconn=’false’ WHERE datname=’template_postgis’;”

5. Now we will leave the terminal for a moment and open pgAdmin. You should have a PostgreSQL server instance called ‘pgtest’ that has two databases in it – postgres and template_postgis (there may be more if you took my previous advice to try creating and exploring databases). Let’s create a brand new database to load our data into. Right-click on Databases in the left frame and choose New Database. On the Properties tab of the New Database dialogue box, set the name to portland, and choose yourself as the owner. Then on the Definition tab, set the template to template_postgis, then click OK. This creates a new database that has the PostGIS extensions, and a table in the public schema called spatial_ref_sys. Thanks to our work in the previous step, the spatial_ref_sys table includes SRID 6856. Now close pgAdmin and return to the terminal.

6. When we installed PostGIS, the upload tools were installed to a very inconvenient location that is not in your system path. We could move the tools to more convenient location, but it’s better to make a link to the tools instead. That way, if the original tool is changed, we don’t have two versions of the tool in different locations.
First, switch into the main program directory:
cd /usr/bin
Next, make a link for shp2pgsql, the shapefile loader:
sudo ln -s /usr/lib/postgresql/9.1/bin/shp2pgsql shp2pgsql
While we’re here, we might as well do the same for raster2pgsql (the raster loader), and pgsql2shp (the shapefile unloader):
sudo ln -s /usr/lib/postgresql/9.1/bin/raster2pgsql raster2pgsql
sudo ln -s /usr/lib/postgresql/9.1/bin/pgsql2shp pgsql2shp
Now we can call these programs from anywhere in the file tree.

7. Finally, we can begin uploading data. The commands will be simpler if we do the uploading from the location of the shapefile, so let’s change directories to the streets folder:
cd ~/GIS/streets
We want to upload the streets.shp file into a table in the public schema called streets. We know the SRID of the data is 6856. Before we can upload the data, we have to package it. The shp2pgsql tool will create an SQL package from a shapefile. Here’s the complete command:
shp2pgsql -S -s 6856 streets.shp public.streets > streets.sql
Shapefile type: Arc
Postgis type: LINESTRING[2]

All the records in this particular shapefile have the same geometry, which is a WKT LineString composed of 2 points. The -S flag forces shp2pgsql to use only simple geometries, instead of creating multi-part geometries. Now we can upload the SQL package we just created into the database:
psql -d portland -f streets.sql
This will take a few minutes. First the following lines will appear momentarily, so quickly that you may not even see them:
psql:streets.sql:24: NOTICE:  CREATE TABLE will create implicit sequence "streets_gid_seq" for serial column "streets.gid"
psql:streets.sql:25: NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index "streets_pkey" for table "streets"
public.streets.geom SRID:6856 TYPE:LINESTRING DIMS:2
(1 row)

And then you’ll see over 100,000 lines of this, one for every street:
At the end, you will finally see:
Before we open pgAdmin to admire our handiwork, let’s upload the municipal boundary file too:
cd ../city_fill
shp2pgsql -s 6856 cty_fill.shp public.cty_fill > cty_fill.sql
This time, our shapefile includes some multi-part polygons, so we can’t use the -S flag or there will be an error.
psql -d portland -f cty_fill.sql
The upload process for this file is similar, except it won’t take nearly as long.

Once the upload is finished, reopen pgAdmin. Now you will see two new tables in the public schema – streets and cty_fill. Coming up in the next installment, we’ll use the PostgreSQL command shell to run a few spatial queries. After that, we’ll be ready to tackle QGIS.

Comments are closed.