Dana, Simon, and Jules here!
This past module, we’ve been
working on comparing the usability and detail of different Land Use Datasets.
“What are those?”, you may ask. Land Use Datasets are GIS datasets that
represent different types of land use as different classes. While
orthophotography such as a satellite image may show a building as a white blob,
for example, a Land Use Dataset would represent it as “urban area”, an
individual data class. Land use datasets can be generated using ArcGIS, or can
be downloaded from a number of different research organizations.
For our investigation, we compared GlobeLand30, a global
Land Cover dataset generated by Chinese scientists, with ESA’s 300m CCI Land
Cover dataset. There are two main differences between the two. GLC’s dataset
has only 10 different land use classifications, but a superior resolution.
ESA’s dataset has 35 classifications, but a worse resolution.
In this image, ESA’s dataset is visualized, along with its
35 classifications (far left), and compared to GLC’s dataset and its 10
classifications (right). Note GLC’s superior resolution.
Aesthetically, a higher resolution dataset would look to be
the better of the two. But there could also be value in having many more
classes, depending on what question you want to answer.
We then set out to test how the two datasets compared to
each other. We did this by first equating the 35 ESA classes to the 10 GLC
classes: for example, all of the 11 ESA classes that describe tree cover fall
under GLC’s “Forest” class. We then wrote a Python script that sampled both
datasets at thousands of random points and compared the point IDs using the
reclassification we did previously. In this way, we were able to quantify how
consistent the two datasets were while describing the same area.
We found that, overall, the two datasets aligned around
%77.3 of the time, and perhaps even more often given a small amount of error
from the ambiguity of comparing different classes to each other. Judging by
such a high percentage of similarity, we can conclude that the two datasets
have classified land use in a very similar way.
Our comparison pie chart.
We also analyzed the most common discrepancies between the
two datasets. Below is a table of our results.
Description (ESA-GLC)
|
Frequency
|
Cropland (Rainfed) - Grassland
|
41
|
Cropland (Rainfed) - Natural Vegetation
|
71
|
Not surprisingly, our most common error related to
classifications of green land that could easily be mistaken by a land use
generating program as either natural vegetation, grassland, or lush cropland.
That’s all for now!