Drinking from the (data) Firehose of Terror

Between classic business transactions and social interactions and machine-generated observations, the digital data tap has been turned on and it will never be turned off. The flow of data is everlasting. Which is why you see a lot of things in the loop around real time frameworks and streaming frameworks. – Mike Hoskins, CTO Actian

From Mike Hoskins to Mike Richards (yes we can do that kind of leap in logic, it’s the weekend)…

Oh, Joel Miller, you just found the marble in the oatmeal!   You’re a lucky, lucky, lucky little boy – because you know why?  You get to drink from… the firehose!  Okay, ready?  Open wide! – Stanley Spadowski, UHF

Firehose of Terror

I think you get the picture – a potentially frightening picture for those unprepared to handle the torrent of data that is coming down the pipe.  Unfortunately, for those who are unprepared, the disaster will not merely overwhelm them.  Quite the contrary – I believe they will be consumed by irrelevancy.

If you’re still with me, let me explain. Continue reading Drinking from the (data) Firehose of Terror

Data Sharing Saved My Life – or How an Insurer Reduced My Healthcare Claim Costs

It’s not every day that you receive snail mail with life-changing information in it, but when it does come, it can come from the unlikeliest sources.

Healthcare data shown in a list of bio sample test results
My initial test results showing problems with the Liver

A year ago, when doing a simple change of health insurance vendors, I had to give the requisite blood sample.  I knew the drill… nurse comes to the house, takes blood, a month later I get new insurance documents in the mail.

But this time the package included something new: the results of my tests.

The report was a list of 13 metrics and their values, including a brief description about what they meant and what my scores should be.  One in particular was out of the norm.  My ALT score, which helps measure liver malfunction, was about 50% higher than the expected range.

Simple Data Can Be Valuable

Here is the key point: I then followed up with my family doctor, with data in hand.  I did not have to wait to see symptoms of a systemic issue and get him to figure it out. We had a number, right there, in black and white. Something was wrong.

Naturally, I had a follow up test to see if it was just a blip.  However, my second test showed even worse results, twice as high in fact!  This lead to an ultrasound and more follow up tests.

In the end, I had (non-alcoholic) Fatty Liver Disease.  Most commonly seen in alcoholics, it was a surprise as I don’t drink.  It was solely due to my diet and the weight I had put on over several years.

It was a breaking point for my system and the data was a big red flag calling me to change before it was too late.

Wellness FX chart of ALT
A chart showing some of my earlier tests – loaded into WellnessFX.com for visualisation.

Not impressed with my weight nor all my other scores, I made simple but dramatic changes to improve my health.*  Changes were so dramatic that my healthcare provider was very curious about my methods.

By only making changes to my diet I was able to get my numbers to a healthy level in just a few months.  In the process I lost 46 pounds in 8 months and recovered from various other symptoms.  The pending train wreck is over.

Long Term Value in Sharing Healthcare Data

It’s been one year this week, so I’m celebrating and it is thanks to Manulife or whoever does their lab tests, for taking the initiative to send me my lab results.

It doesn’t take long to see the business value in doing so, does it?   I took action on the information and now I’m healthier than I have been in almost 20 years.  I have fewer health issues, will use their systems less, will cost them less money, etc.

Ideally it benefits the group plan I’m in too as a lower cost user of the system.  I hope both insurers and employers take this to heart and follow suit to give the data their people need to make life changing and cost reducing decisions like this.

One final thought.. how many people are taking these tests right now?  Just imagine what you could do with a bit of data analysis of their results.  Taking these types of test results, companies could be making health predictions for their customers and health professionals to review.  That’s why I’m jumping onto “biohacking” sites like WellnessFX.com to track all my scores these days and to get expert advice on next steps or access to additional services.

I’m so happy with any data sharing, but why give me just the raw data when I still have to interpret it?  I took some initiative to act on the results, but what if I had needed more incentive?  If I had been told “Lower your ALT or your premiums will be 5% higher” I would have appreciated that.

What’s your price?  If your doctor or insurer said “do this and save $100” – would you do it?  What if they laid the data out before you and showed you where your quality of life was headed, would it make a difference to you?

I’m glad I had this opportunity to improve my health, but at this point I just say thanks for the data … and pass the salad please!

Tyler


* I transitioned to a Whole Food – Plant Based diet (read Eat to Live and The China Study).  You can read more about the massive amounts of nutrition science coming out every year at NutritionFacts.org or read research papers yourself.

Create Tile Map Structure – gdal2tiles command

Tiles in a Tile Map Server (TMS) context are basically raster map data that’s broken into tiny pre-rendered tiles for maximum web client loading efficiency. GDAL, with Python, can chop up your input raster into the folder/file name and numbering structures that TMS compliant clients expect.

OpenLayers mapping application showing natural earth dataset
Default OpenLayers application produced by the gdal2tiles command and a Natural Earth background dataset as input.

This is an excerpt from the book: Geospatial Power Tools – Open Source GDAL/OGR Command Line Tools by me, Tyler Mitchell.  The book is a comprehensive manual as well as a guide to typical data processing workflows, such as the following short sample…

The bonus with this utility is that it also creates a basic web mapping application that you can start using right away.

The script is designed to use georeferenced rasters, however, any raster should also work with the right options. The (georeferenced) Natural Earth raster dataset is used in the first examples, with a non-georeferenced raster at the end.

There are many options to tweak the output and setup of the map services; see the complete gdal2tiles chapter for more information.

Minimal TMS Generation

At the bare minimum an input file is needed:

gdal2tiles.py NE1_50M_SR_W.tif
Generating Base Tiles:
0...10...20...30...40...50...60...70...80...90...100 - done.
Generating Overview Tiles:
0...10...20...30...40...50...60...70...80...90...100 - done.

The output created is the same name as the input file, and include an array of sub-folders and sample web pages:

NE1_50M_SR_W
NE1_50M_SR_W/0
NE1_50M_SR_W/0/0
NE1_50M_SR_W/0/0/0.png
NE1_50M_SR_W/1
...
NE1_50M_SR_W/4/9/7.png
NE1_50M_SR_W/4/9/8.png
NE1_50M_SR_W/4/9/9.png
NE1_50M_SR_W/googlemaps.html
NE1_50M_SR_W/openlayers.html
NE1_50M_SR_W/tilemapresource.xml

Open the openlayers.html file in a web browser to see the results.

The default map loads a Google Maps layer, it will complain that you do not have an appropriate API key setup in the file, ignore it and switch to the OpenStreetMap layer in the right hand layer listing.

 

The resulting map should show your nicely coloured world map image from the Natural Earth dataset. The TMS Overlay option will show in the layer listing, so you can toggle it on/off to see that it truly is loading. Figure 5.2 (above) shows the result of our gdal2tiles command.


Geospatial Power Tools is 350+ pages long – 100 of those pages cover these kinds of workflow topic examples.  Each copy includes a complete (edited!) set of the GDAL/OGR command line documentation as well as the following topics/examples:

Workflow Table of Contents

  1. Report Raster Information – gdalinfo 23
  2. Web Services – Retrieving Rasters (WMS) 29
  3. Report Vector Information – ogrinfo 35
  4. Web Services – Retrieving Vectors (WFS) 45
  5. Translate Rasters – gdal_translate 49
  6. Translate Vectors – ogr2ogr 63
  7. Transform Rasters – gdalwarp 71
  8. Create Raster Overviews – gdaladdo 75
  9. Create Tile Map Structure – gdal2tiles 79
  10. MapServer Raster Tileindex – gdaltindex 85
  11. MapServer Vector Tileindex – ogrtindex 89
  12. Virtual Raster Format – gdalbuildvrt 93
  13. Virtual Vector Format – ogr2vrt 97
  14. Raster Mosaics – gdal_merge 107

Query Vector Data Using a WHERE Clause – ogrinfo

The following is an excerpt from the book: Geospatial Power Tools – Open Source GDAL/OGR Command Line Tools by Tyler Mitchell.  The book is a comprehensive manual as well as a guide to typical data processing workflows, such as the following short sample…

Use SQL Query Syntax with ogrinfo

Use a SQL-style -where clause option to return only the features that meet the expression. In this case, only return the populated places features that meet the criteria of having NAME = ’Shanghai’:

$ ogrinfo 10m_cultural ne_10m_populated_places -where "NAME = 'Shanghai'"

... 
Feature Count: 1 Extent: (-179.589979, -89.982894) - (179.383304, 82.483323) 
... 
OGRFeature(ne_10m_populated_places):6282
 SCALERANK (Integer) = 1 
 NATSCALE (Integer) = 300 
 LABELRANK (Integer) = 1 
 FEATURECLA (String) = Admin-1 capital 
 NAME (String) = Shanghai
... 
 CITYALT (String) = (null) 
 popDiff (Integer) = 1 
 popPerc (Real) = 1.00000000000 
 ls_gross (Integer) = 0 
 POINT (121.434558819820154 31.218398311228327)

Building on the above, you can also query across all available layers, using the -al option and removing the specific layer name. Keep the same -where syntax and it will try to use it on each layer. In cases where a layer does not have the specific attribute, it will tell you, but will continue to process the other layers:

   ERROR 1: 'NAME' not recognised as an available field.

NOTE: More recent versions of ogrinfo appear to not support this and will likely give FAILURE messages instead.


Geospatial Power Tools is 350 pages long – 100 of those pages cover these kinds of workflow topic examples.  Each copy includes a complete (edited!) set of the GDAL/OGR command line documentation as well as the following topics/examples:

Workflow Table of Contents

  1. Report Raster Information – gdalinfo 23
  2. Web Services – Retrieving Rasters (WMS) 29
  3. Report Vector Information – ogrinfo 35
  4. Web Services – Retrieving Vectors (WFS) 45
  5. Translate Rasters – gdal_translate 49
  6. Translate Vectors – ogr2ogr 63
  7. Transform Rasters – gdalwarp 71
  8. Create Raster Overviews – gdaladdo 75
  9. Create Tile Map Structure – gdal2tiles 79
  10. MapServer Raster Tileindex – gdaltindex 85
  11. MapServer Vector Tileindex – ogrtindex 89
  12. Virtual Raster Format – gdalbuildvrt 93
  13. Virtual Vector Format – ogr2vrt 97
  14. Raster Mosaics – gdal_merge 107
%d bloggers like this: