Catching Bad Characters

When loading data into a PostGIS database using shp2pgsql, have you ever gotten this cryptic error?

Unable to convert data value to UTF-8 (iconv reports "Invalid or
incomplete multibyte or wide character"). Current encoding is "UTF-8".
Try "LATIN1" (Western European), or one of the values described

This means that there is an invalid or incomplete multibyte character in your data somewhere.  Unfortunately, the error message figures the only possible reason for this is that you have got the character set of your database wrong.  There is another possibility, though – that you got the character set right, and your shapefile contains an invalid multibyte character.  Which is unfortunate, but easy enough to fix.

Open QGIS, and add your shapefile, making sure to set the file’s encoding to UTF-8.  Now open the attribute table and start scrolling through the records.  If it was displayed in octal or hex form, the invalid character would be impossible to find simply by scanning.  Fortunately, QGIS displays invalid characters using an easily recognizable dingbat:

QGis bad character dingbat

Once you’ve found the offending character(s), simply delete it, and replace if necessary.  After closing QGIS, the shp2pgsql operation should complete without errors.

Comments are closed.