Big

OSM data + satellite imagery + machine learning = Skynet

Anand Thakker (anand@developmentseed.org) . Terminator I'm not an expert at ML (and, frankly, I'm also quite new to the OSM world)... but I think ML's really fascinating, and I think there's real promise in using it with OSM data. - (a) Why I think ML+OSM is a good idea. - (b) The experiments I've been working on: what I've tried, what's failed, what's worked. - (c) How you can replicate and extend it.

Despite 30 years of work on automatic road detection, no automatic or semi-automatic road detection system is currently on the market and no published method has been shown to work reliably on large datasets of urban imagery

Mnih and Hinton, 2010 Hard problem From my admittedly amateur research, I haven't found reason to believe that this has changed much in the past 5-6 years.

ML hearts CV: Machine learning is an incredibly broad field, but computer vision is one area where it really shines, and where its recent advances have been both mind-boggling and totally mundane. Mundane: things like face detection/recognition, voice recognition on smartphones Mind-boggling: style transfer

Machine Learning ('Supervised' Learning)

Model: takes an input, 'predicts' the output
Training data: inputs + 'true' / 'expected' outputs

. I'm going to focus here on one major branch of ML, called "supervised learning." For a really nice, broader intro, I recommend checking out Stuart Lynn's talk from this year's FOSS4G-NA, "Machine Learning with Geospatial Data" -- it's on YouTube. Ok, here's the idea with supervised learning. You have a *model*, which is basically a function that takes inputs and produces outputs. There are many, many different kinds of models: neural networks, random forests, support vector machines; but, especially for those of us who are not ML researchers, these models are often pretty much black boxes. You also have some *training data*. This is a set of inputs for which you already know the correct or desired output. E.g.: - flickr images and their categories - audio clips and their text transcriptions - photographs and the locations of human faces

Training a Model

Apply model to inputs
Compare model's prediction to ground truth
Tweak the model based on error
Repeat (a lot)

Once you have those, the big picture idea for ML training is really very simple: you take your model, which starts out completely wrong and random. You apply it to the inputs in your training data to get its "predicted" output, and then compare that to the "known" / "expected" / "ground truth" outputs. Based on the error, tweak the model's parameters to make it better -- e.g., if the outputs were too big, tweak the model to produce smaller numbers. Now repeat... a LOT. Now, the details -- especially in calculating error and then tweaking model parameters -- that is some subtle stuff which depends quite a bit on the inner workings of the models you're using; but, fortunately, that complexity is mostly taken care of by ML tools and libraries. Anyway, the real magic here is that if things go well, then after you've trained the model, it will *generalize* beyond the training data, producing (mostly) correct answers for inputs that *weren't* in your training data.

Answer 2:
imagery (input) + OSM (ground truth) = amazing training data

In general, one of the biggest challenges in machine learning is getting good training data, and this is what I mean about imagery + OSM being a great fit for machine learning: it represents an amazing source of training data.

My experiments so far have all been based on a particular neural network model called SegNet. - "semantic segmentation" = essentially carving up an image into distinct, categorized pieces - Late 2015, by researchers at University of Cambridge - Based on the "VGG-16" network, which was also one of the ImageNet winners in 2014 - 26 or 89 layers, depending on how you count (26 "convolution" layers) - "State of the art" image segmentation results

Here's what the training data looks like. Left is the input image; right is the "ground truth" rendered from OSM. But, this is also an example of one of my first mistakes: I rendered the roads here at a width of 1 pixel; when I used this for training, I found that the network just couldn't learn from it.

[{
  "name": "Road",
  "color": "#ffffff",
  "stroke-width": "1""5",
  "filter": "[highway].match('.+') and not
             ([tunnel] = 'yes' or [tunnel]='true')"
}, {
  "name": "Building",
  "color": "#ff0000",
  "filter": "[building].match('.+')"
}]

But this is pretty easy to fix with the skynet-data scripts. They use a small JSON description for each different "class" that we want the network to identify; all I had to do was bump the stroke width from 1 to 5 and re-render. You can also see here how the filter that's being used to pull out the right features for each category.

The model really starts to sharpen up as it gets older! I started getting pretty excited at these results, but one limitation here is that most of the tiles in this training and testing dataset have pretty low road density. And, indeed, if we look at how this model performs in a more urban environment...

We're back to really fuzzy, indistinct predictions, even from the "older", more trained model. So, early last week, I put together a set of training data using tiles only from Seattle and started training a model that, hopefully, would do better at detecting streets in cases like this.

One common point of failure that I noticed here was that parking lots like this one tend to confuse the model. I think that retraining with a separate label for parking lots might be very helpful, but I haven't investigated whether we've got good, tagged polygons for parking lots in OSM.

On the other hand, the model seems to handle visual breaks in the road quite well; for example, you can see here that even though trees' shadows are falling across the road, the model still produces a nice, smooth line. There's also that false positive up there -- and there are definitely plenty of these that hurt the model's accuracy. But there are _also_ cases like this one...

What's Next

Improving Training
- Filter training data with OSM completeness/quality metrics
- How much can we improve the model's ability to generalize?
- Do we really a model as large/complex as SegNet?
- Incorporate telemetry?
Improving Output
- Diff against existing OSM features
- Vectorize results
Train against buildings
Train against road surface type
Integrate with... MapSwipe, iD, ...?
much,
much
more...