XML to describe and catalogue datasets
I've picked away at a Python script over the course of a few years, that helps scan a tree of file folders and find spatial datasets. It's pretty simple and uses GDAL/OGR libraries to identify the formats and examine the contents. Now it outputs an XML structure that I'd love some constructive feedback on.
Initially, like 5 years ago, I used it to output the results in a delimited text format that I would import into a database for further investigation - for example to count my layers, find duplicates, etc. Matt Perry also did a bunch of work to modify the script to handle rasters.
Late last year I refactored the script(s) and set it up to output an XML document of the various properties the script collected about the datasets, layers, raster, bands, etc. I had done some searching to try and find a standard for storing this representation but couldn't find any. Most metadata catalogues seemed to be primarily focused on the high level info that I wasn't interested in for this case (organisational contacts, online resource URLs, how it was collected, etc.). I wanted something slightly lower level - basically all the stuff that gdalinfo and ogrinfo command line tools return.
Not finding any really applicable options, I made up a simple structure and would love some feedback on it. I'm not used to making XML, that is for sure, so I'm sure the whole idea could be improved.
In case you are interested, I am currently refactoring the code even more and have already hacked in abilities to output a basic set of SQL INSERT statements.
There is an example of the current output of the process: here and (gdalogr_catalogue.py) here is the script (may be slightly out of date) as well as a supporting script it uses (xmlgen.py) here.
Tyler Mitchell
29-February-2008
Mateusz Loskot:
Tyler, what's missing for me is some practical use case and presentation of how you usually use the XML file with the geo-catalogue.
This would also help to identify if the XML format used is complete or not.
Post new comment