OSM Tip: Using OSM data
It was my birthday this week and I had family visiting from abroad, so I will only send out one OSM Tip this week—this is it!
If you go to the OpenStreetMap website, what you see first is a map, and that is how the casual observer / layperson looks at it. People compare OSM to Google Maps and expect it to have mobile apps, street view, aerial image layers, and all the bells and whistles that come with a consumer map.
The OSM website is wrong to give that impression. OSM is first and formost map data. Data you can make consumer apps with, as well as pretty much anything you can think of.
OSM-based printed bike maps of Cincinnati, US. Photo by Nate Wessel, CC-BY-SA 3.0
To get started with using OSM data, you first need to download some or all of it. I wrote about this in a previous tip.
OSM comes in a format called PBF or Protocol buffer Binary Format. This is not a format traditional GIS applications can (easily) read. Usually you will need to use one of the bespoke tools from the OSM Open Source software ecosystem to read and further process it. I will just touch on a few common use cases here to get you started.
Destop GIS
QGIS is an open source alternative to traditional GIS applications. Its active developer and user community ensure a continuous stream of enhancements and improvements to this cross-platform desktop GIS. It can load OSM PBF data natively, and there are numerous plugins that make working with OSM data easier and add OSM data based functionality like geocoding and routing.
To use OSM data in other desktop GIS environments, you're probably best advised to try and get a hold of Shapefile versions of OSM data. These are provided by third parties, not by OSM itself. The OSM wiki site lists a number of sources where you may get OSM Shapefiles.
PostGIS
A common way to use OSM data is to load it into a local PostgreSQL database. Once loaded, you can access the data using desktop GIS, perform geospatial analysis and processing in plain SQL or in an analysis environment like R Studio or Python Notebook, or render map tiles for visualization. Common tools for importing raw OSM data files into PostGIS are imposm , osm2pgsql and osmosis.
OSM data loaded into a Jupyter notebook from PostGIS
"Big data"
I'm a little out of my depth here, so I am only going to scratch the surface. Recently, OSM data has become easily available to use in distributed and cloud computing scenarios. AWS hosts OpenStreetMap data files on S3, making it easily accessible for lightning fast querying using Athena. Here's a blog post going into detail. You can also convert raw OSM files into Apache Parquet files using osm-parquetizer for further processing in a Hadoop environment.
Filtering
OSM contains many different types of features and often you only need one "layer", like roads, land use or POIs. Some of the tools mentioned above have options built in to filter data as you load it into the environment of your choice. One tool I like a lot to filter OSM data is osmium, a swiss army knife for OSM data. One of its many capabilities is to filter an OSM file by tag. Here's an example where I filter an OSM data file to output a smaller file with just features that have the opening_hours
tag:
osmium tags-filter -o ~/osm/data/utah_opening_hours.osm.pbf ~/osm/data/utah-latest.osm.pbf nw/opening_hours
That's all for this week! The holidays are coming so the tip frequency may stay low for a bit as I enjoy more time away from the computer. As always thanks for reading and let me know what you think! If this email was forwarded to you and you would like to subscribe yourself, you can do so here.