Convert lines/polys to point data – ogr_explode.py

How do I convert my linestring shapefile into points?

It’s a common enough question on the OSGeo mailing lists and stackexchange.com. Even with the power of GDAL/OGR command line utilities, there isn’t a single command you can run to do a conversion from lines/polygons into points. Enter ogr_explode.py.

I’ve co-opted the term “explode” to refer to this need and put together a simple Python script to show how it can be done. You can access it on GitHub here or see just the main guts of it listed below.

It’s just a start, without any real error checking, and can be greatly improved with more options – e.g. to take elevation from the parent lines/polys attributes, to output to another shapefile, etc. Because it outputs in a CSV (x,y,z) format, you can give it a .csv extension and start to access it with OGR utilities right away (VRT may be required). For example, you could then push it through gdal_grid, but that’s another topic…

Since it is a common enough need, it seems, I encourage you to fork a copy and share back.

[Note, QGIS has a nice option under its Vector -> Geometry Tools -> Extract Nodes tool to do the same thing with a GUI. Props to NathanW for having the most easily accessible example to refer to 🙂 ]


...
for feat in lay:
geom = feat.GetGeometryRef()
for ring in geom:
points = ring.GetPointCount()
for p in xrange(points):
lon, lat, z = ring.GetPoint(p)
outstr = "%s,%s,%s\n" % (lon,lat,z)
outfile.write(outstr)
...

Python + QGIS Book Underway…

From Locate Press:
“We don’t have a cover to tease you with, but we have started a website where you can learn more: http://pythongisbook.com · The PyQGIS Programmers Guide. Extending Quantum GIS with Python. This all ne…Read more…

GDAL/OGR Book Update

Last year we started on a new GDAL/OGR focused book and it’s now nearing completion!

Read more on the Locate Press blog…

LIDAR data to raster file with open source GDAL tool

As in my previous post, you can see how a generic approach to taking irregularly spaced text format data may be applied to a wide range of contexts. In this post I show a particular example of taking some freely available LIDAR result data and processing it with the same approach.

Get the Data

To follow along, you can grab the same data I did from OpenTopography.org. You can find your way through their search tools for an area of interest.

My resulting data is in CSV/ASCII format, with no gridding applied. Grab the dataset and metadata from here.

Unzipping the file produces a 138MB text file called points.txt. This is already in a format that OGR can read, but to help it along just rename the file to points.csv and we don’t have to mess around changing the delimiter (as in previous post). Note the first line contains column headings too, so this is a perfect example to be using:


x,y,z,classification,gpstime,scan_angle,intensity,number_of_returns...
330000.05,7001998.62,1339.69,2,0.000000,0,171,0,0,3,0,0,0,0,0,0
330000.13,7001999.41,1340.28,1,0.000000,0,115,0,0,3,0,0,0,0,0,0
330000.32,7001999.60,1340.12,2,0.000000,0,131,0,0,3,0,0,0,0,0,0
330000.22,7001998.78,1339.83,2,0.000000,0,184,0,0,3,0,0,0,0,0,0

Create VRT file

In case it wasn’t clear previously, we create a VRT file so that gdal_grid (or any OGR application) knows how to find the X/Y coordinates in the CSV file. So we just create and tweak like in the previous example but include the field names that are already in points.txt, etc. Here is our new points.vrt file:


<OGRVRTDataSource>
 <OGRVRTLayer name="points">
  <SrcDataSource>points.csv</SrcDataSource>
  <GeometryType>wkbPoint</GeometryType>
  <LayerSRS>EPSG:32607</LayerSRS>
  <GeometryField encoding="PointFromColumns" x="x" y="y" z="z"/>
 </OGRVRTLayer>
</OGRVRTDataSource>

Test results by running against ogrinfo:


ogrinfo points.vrt -al -so
INFO: Open of `points.vrt'
using driver `VRT' successful.
Layer name: points
Geometry: Point
Feature Count: 2267774
Extent: (330000.000000, 7001270.110000) - (330589.740000, 7001999.990000)
Layer SRS WKT:
PROJCS["WGS 84 / UTM zone 7N",
GEOGCS["WGS 84",
DATUM["WGS_1984",
SPHEROID["WGS 84",6378137,298.257223563,
AUTHORITY["EPSG","7030"]],
AUTHORITY["EPSG","6326"]],
PRIMEM["Greenwich",0,
AUTHORITY["EPSG","8901"]],
UNIT["degree",0.0174532925199433,
AUTHORITY["EPSG","9122"]],
AUTHORITY["EPSG","4326"]],
UNIT["metre",1,
AUTHORITY["EPSG","9001"]],
PROJECTION["Transverse_Mercator"],
PARAMETER["latitude_of_origin",0],
PARAMETER["central_meridian",-141],
PARAMETER["scale_factor",0.9996],
PARAMETER["false_easting",500000],
PARAMETER["false_northing",0],
AUTHORITY["EPSG","32607"],
AXIS["Easting",EAST],
AXIS["Northing",NORTH]]
x: String (0.0)
y: String (0.0)
z: String (0.0)
classification: String (0.0)
gpstime: String (0.0)
scan_angle: String (0.0)
intensity: String (0.0)
number_of_returns: String (0.0)
return_number: String (0.0)
point_source_ID: String (0.0)
edge_of_flight_line_flag: String (0.0)
direction_of_scan_flag: String (0.0)
user_data: String (0.0)
red_channel: String (0.0)
green_channel: String (0.0)
blue_channel: String (0.0)

Grid the Data

Here’s the part that takes a while. With > 2 million rows it takes a bit of time to process, depending on what algorithm settings you use. To just use the default (inverse distance to a power) run it without an algorithm option:


gdal_grid -zfield z -l points points.vrt points.tif
 
Grid data type is "Float64"
Grid size = (256 256).
Corner coordinates = (329998.848164 7002001.415547)-(330590.891836 7001268.684453).
Grid cell size = (2.303672 2.851094).
Source point count = 2267774.
Algorithm name: "invdist".
Options are "power=2.000000:smoothing=0.000000:radius1=0.000000:
radius2=0.000000:angle=0.000000:max_points=0:min_points=0:nodata=0.000000"

I also got good results from the using the nearest neighbour options as shown in the gdal_grid docs. Note it creates a 256×256 raster by default. In the second example it shows how to set the output size (-outsize 512 512). The second example took about an hour to process on my one triple core AMD 64 desktop machine with 8GB RAM:


gdal_grid -zfield z -outsize 512 512 \
-a nearest:radius1=0.0:radius2=0.0:angle=0.0:nodata=0.0 \
-l points points.vrt points_nearest_lrg.tif
 
Grid data type is "Float64"
Grid size = (512 512).
Corner coordinates = (329999.424082 7002000.702773)-(330590.315918 7001269.397227).
Grid cell size = (1.151836 1.425547).
Source point count = 2267774.
Algorithm name: "nearest".
Options are "radius1=0.000000:radius2=0.000000:angle=0.000000:nodata=0.000000"

GDAL rasters from irregular point or ASCII data

The following is an excerpt from the book: Geospatial Power Tools – Open Source GDAL/OGR Command Line Tools by Tyler Mitchell.  The book is a comprehensive manual as well as a guide to typical data processing workflows, such as the following short sample… 

A common question that seems to come up on the GDAL discussion lists seems to be:

How do I take a file of text data or OGR readable point data and turn it into an interpolated regular grid?

Depending on what your source data looks like, you can deal with this through a three step process, but is fairly straightforward if you use the right tools. Here we stick to just using GDAL/OGR command line tools.

1. Convert XYZ data into CSV text

If the source data is in a grid text format with rows and columns, then convert it to the XYZ output format with gdal_translate:
gdal_translate -of XYZ input.txt grid.xyz
If your source data is already in an XYZ format, or after above conversion, then change the space delimiters to commas (tabs and semicolon also supported). This is easily done on the command line using a tool like sed or manually in a text editor:


sed 's/ /,/g' grid.xyz > grid.csv

The results would look something like:

-120.5,101.5,100
-117.5,101.5,100
-116.5,100.5,150
-120.5,97.5,200
-119.5,100.5,100

2. Create a virtual format header file

Step 3 will require access to the CSV grid data in an OGR supported format, so we create a simple XML text file that can then be referred to as a vector datasource with any OGR tools.
Create the file grid.vrt with the following content, note the pointers to the source CSV file and the fields which are automatically discerned from the CSV file on-the-fly:

<OGRVRTDataSource>
<OGRVRTLayer name="grid">
<SrcDataSource>grid.csv</SrcDataSource>
<GeometryType>wkbPoint</GeometryType>
<LayerSRS>WGS84</LayerSRS>
<GeometryField separator=" " encoding="PointFromColumns" x="field_1" y="field_2" z="field_3"/>
</OGRVRTLayer>
</OGRVRTDataSource>

If the first row of the grid.csv file includes custom field names, e.g. X, Y, Z, then substitute them for the field_1, field_2 and field_3 settings above.

3. Convert using gdal_grid

Now for the following step, convert this OGR vector VRT and CSV format into a GDAL raster format, while also interpolating any “missing” points in the grid. Various options are available for interpolating. See gdal_grid command syntax for more information, a minimal example is shown here using all the defaults:

gdal_grid -zfield field_3 -l grid grid.vrt newgrid.tif

The output product will fill in any holes in the source grid. Several algorithms are available, the default uses the inverse distance to a power option.

Results shown in the following image.

Grow or Die – 3 Key GIS Career Areas

Those of us involved with open source geospatial software already know that you must innovate or die. Okay, maybe that’s a bit dramatic, but I know many of us feel like we will die if we do not continually learn more and be challenged. I just wrote an article that tries to articulate some of the areas where we need to keep our muscle stretched. This is especially for those who are looking at their career in horror – wondering when their skills will be leap-frogged into oblivion by a recent grad who happened to spend more time learning on his/her own time.

Check out the article on GoGeomatics.ca. I would love your comments and feedback on the article.

GeoBase.ca – my other favourite site for Canadian geodata

Following from my last post, if you’re looking for a site for “framework” Canadian geodata, check out GeoBase.ca, including layers such as: hydrographic, road network, various admin boundaries, imagery, and more.. all for free download.

But today I’m pulling it in using their WMS option to dynamically view the data without downloading:

http://geobase.ca/geobase/en/wms/index.html

I hope we’ll see more provinces get in on the action and start putting more higher res imagery up for WMS feeds, but for now I’m not complaining. Keep up the good work! If you enjoy it too, let them know by encouraging them with good feedback.

By the way, anyone know of an Ontario higher res imagery WMS or download site?

GeoBase WMS connection:
http://ows.geobase.ca/wms/geobase_en

Canada Toporama WMS – great stuff

I went hunting for a good set of Canada-wide WMS layers and (re)discovered some great work from NRCAN on the Toporama site

Good response time but also nice cartographic quality, well, at least if you want something that’s similar to the printed topographic map you may have on your wall. Good work as usual from NRCAN and co. Anyone know what they are serving it up with?

Here is their WMS connection.

 

Open Source Desktop GIS book almost ready…

UPDATE: The Book Is Ready – Get It Here! –

This week we’re putting the final polish on Gary Sherman’s new book. For more info see Locate Press site:

 Gary Sherman has done it again… Originally published as Desktop GIS, by Pragmatic Press, it quickly went out of print. Locate Press is proud to bring this work back into print by releasing this fully updated second edition. More information coming soon. See Gary’s website dedicated to his book – DesktopGISBook.com.
Leave a note here if you want to be informed when it is available.

Cataloging Spatial Assets – A Metadata Script Approach

Years ago I started writing some scripts with the end goal of doing a wholesale migration of my data into open source spatial databases. I since left that job and didn’t have much need for the scripts, but recently picked them back up again. I never made it to the migration part, but decided instead to focus on cataloging my GIS data so I could then create other apps to use the catalog for looking things up, creating overview maps, and ultimately to fuel migration scripts. The historical name for this project is somewhat irrelevant, for now, the Catalog portion of a broader plan I had called the Network Mapping Engine (NME).

My current efforts are available here: https://github.com/spatialguru/NME/tree/master/nme/cat

Want to try it? Ensure you have the prerequisite xmlgen and elementtree_pretty libs (included on the site) as well as a recent GDAL/OGR install with Python bindings. It’s reported to work well with the OSGeo4W distribution, though I have only tested it recently on Mac OSX 10.6.

The main feature is the gdalogr_catalogue.py script but the above dependencies will also have to be available in your PYTHONPATH.

Quick Start

The simple command line usage only requires one parameter – pointing to the folder that you want to catalog. It will also recursively scan the sub-folders as well.

  python gdalogr_catalogue.py -d /tmp/data

The result is basic XML (more about the details of output in a moment):

<?xml version="1.0" ?>
<DataCatalogue>
  <CatalogueProcess>
    <SearchPath>
      /tmp/data
    </SearchPath>
    <LaunchPath>
      /private/tmp/data
    </LaunchPath>
    <UserHome>
      /Users/tyler
    </UserHome>
    <IgnoredStrings>
      ['.svn', '.shx', '.dbf']
    </IgnoredStrings>
    <DirCount>
      2
    </DirCount>
    <FileCount>
      16
    </FileCount>
    <Timestamp>
      Mon Mar 19 14:42:35 2012
    </Timestamp>
 .... huge snip ....

Pipe the output into a new file or use the save-to-file output option:

  python gdalogr_catalogue.py -d /tmp/data -f output.xml

About the XML

Most metadata standards include pretty high level information about datasets, but the focus of this project is to grab as low level data as possible and make it easily consumable by other applications. For example, consider how valuable hierarchical information about all the datasets on your system or in your enterprise would be. A map viewing application could access the catalog instead of having to repetitively scan folders and open files. It could fuel part of a geospatial search engine for that matter!

The original work created CSV text files, then it produced SQL statements, but now it has final settled on producing XML. There was no existing standard for this type of metadata, so I had to create one. Because it is built on top of GDAL/OGR, much of it parallels the GDAL/OGR API, but additional information is also captured about the process of cataloguing and file statistics. It’s still changing and growing but no huge changes are expected.

There are three primary sections in the root DataCatalogue XML element, and a couple sub-elements as well:

  1. CatalogueProcess – captures details of how the script was run, the operating system, timestamp, etc.
  2. VectorData – vector dataset details, format, location
    1. FileStats – information about the file or path, including user that owns it, modification date and a unique checksum
    1. VectorLayer – for each layer, spatial data details, format, extent
  3. RasterData – raster dataset details, format, location, dimensions
    1. RasterBand – for each band, data details, format, min/max values

I won’t review all the levels of elements but instead suggest you have a look at the sample_output.xml in your browser, along with the catalogue.xsl and css/catalogue.css, it renders to show some basic example data:

To have similar rendering, copy over the header/declaration details from the sample_output.xml to it points to the css and xsl file.

 

Future?

One of my goals is to have this XML output plug into other applications as the basis for an enterprise data catalog environment. My first goal is to add capabilities for generating a MapServer map file from the metadata. This would also allow the script to produce overview maps of the dataset extents. Another idea is to create a QGIS plugin to visualise and browse through the datasets, while also allowing option loading of the layer into QGIS.Have a better idea? Let me know and we can discuss it. Maybe I need to come up with a contest, hmm…

Updating the XML is another challenge on the horizon. I built in the creation of md5 hash based checksum so it can check for file changes without even having to open the file with GDAL/OGR, but there are no routines yet to allow it to update the XML in-place.

Obviously, there is more to come, as more people try it and let me know how they get on. Drop me a comment or email and I’d love to help you get it running if it fails in your environment!

%d bloggers like this: