What they do

  • Pixel-accurate, but small-scale ground truth available.
  • Less accurate reference data that is readily available in arbitrary quantities, at no cost.

Tools

  • OpenStreetMap (OSM)

Description

They use OSM like weakly labeled training data for three classes, buildings, roads and background with RGB orthophotos from Google Maps.

The author’s hypotheses

  • The volume of training data can possibly compensate for lower accuracy of the labeling.
  • The large variety present in very large training set could potentially improve the classifier’s ability to generalize to new unseen location.
  • The addition of the large volume of weak data to low volume high-quality data could potentially improve the classification.
  • If low-accuracy, large-scale training data helps, it may allow substituting a large portion of the manually annotated high-quality data.

Model

FCN (Fully convolutional network) with modifications and as input, an 500x500pixel patch (mini batch 1 image).

They make some pre-training before the real training to get better results. One by the Pascal VOC 20101 and the other with the OSM data.

Experiments

  • Complete substitution (No manual labeling).
  • Augmentation (pre-training help or not).
  • Partial substitution (pre-training with OSM labeling in the training).

Datasets

ground sampling distance (GSD).
Potsdam - 1 image (6000x6000 pixel) = 144 images (500x500 pixel), only 21 have been manually labeled.

Results

For Chicago with only the OSM training.

The methode 52 is:

The finals results are:

Conclusions

  • Large scale, nevertheless significantly improves segmentation performance, and improves generalization ability of the models.
  • Training only on open data, without manual labelling, achieves reasonable results.
  • Large-scale pre-training with OSM labels significantly benefits semantic segmentation.

  1. M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The Pascal visual object classes (VOC) challenge,” International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, 2010. 

  2. J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in CVPR, 2015.