In this article they introduce a large-scale Dataset for Object deTection in Aerial images (DOTA). ((‘see details’)[])


  • 2806 aerial images.
  • 800 × 800 to 4000 × 4000 pixels.
  • wide variety of scales, orientations and shapes.
  • 15 common object categories.
  • contains 188 282 instances.


They argue that a good aerial image dataset should possess four properties, namely: 1) a large number of images 2) many instances per categories 3) properly oriented object annotation 4) many different classes of objects, which make it approach to real-world applications

note: DOTA consists of 15 different categories but only 14 main categories, because small vehicles and large vehicle are both sub-categories of the broad class of vehicle.


  • plane
  • ship
  • storage tank
  • baseball diamond
  • tennis court
  • basketball court
  • ground track field
  • harbor
  • bridge
  • large vehicle
  • small vehicle
  • helicopter
  • roundabout
  • soccer ball field
  • basketball court


The annotated images by a quadrilateral bounding boxes, which can be denoted as “x1, y1, x2, y2, x3, y3, x4, y4” where (xi, yi) denotes the positions of the oriented bounding boxes’ vertices in the image.

x1, y1, x2, y2, x3, y3, x4, y4, category, difficult
x1, y1, x2, y2, x3, y3, x4, y4, category, difficult

note: ‘gsd’ is the ground sample distance, the physical size of one image pixel, in meters.


They provide results for state-of-the-art object detection algorithms on DOTA and a comparaison with horizontal bounding boxes (HBB) and oriented bounding boxes (OBB).


  • Faster R-CNN
  • R-FCN
  • YOLOv2
  • SSD