Welcome!

Welcome to the blog for the Oberlin College Geomorphology Research Group. We are a diverse team of students working with Amanda Henck Schmidt on geomorphology questions. This blog is an archive of our thoughts about our research, field work travel notes, and student research projects. Amanda's home page is here.

Friday, March 31, 2017

What Makes an Awesome Global Land Use Classification Dataset?

Dana, Simon, and Jules here! 
This past module, we’ve been working on comparing the usability and detail of different Land Use Datasets. “What are those?”, you may ask. Land Use Datasets are GIS datasets that represent different types of land use as different classes. While orthophotography such as a satellite image may show a building as a white blob, for example, a Land Use Dataset would represent it as “urban area”, an individual data class. Land use datasets can be generated using ArcGIS, or can be downloaded from a number of different research organizations.
For our investigation, we compared GlobeLand30, a global Land Cover dataset generated by Chinese scientists, with ESA’s 300m CCI Land Cover dataset. There are two main differences between the two. GLC’s dataset has only 10 different land use classifications, but a superior resolution. ESA’s dataset has 35 classifications, but a worse resolution.

In this image, ESA’s dataset is visualized, along with its 35 classifications (far left), and compared to GLC’s dataset and its 10 classifications (right). Note GLC’s superior resolution.
Aesthetically, a higher resolution dataset would look to be the better of the two. But there could also be value in having many more classes, depending on what question you want to answer.
We then set out to test how the two datasets compared to each other. We did this by first equating the 35 ESA classes to the 10 GLC classes: for example, all of the 11 ESA classes that describe tree cover fall under GLC’s “Forest” class. We then wrote a Python script that sampled both datasets at thousands of random points and compared the point IDs using the reclassification we did previously. In this way, we were able to quantify how consistent the two datasets were while describing the same area.
We found that, overall, the two datasets aligned around %77.3 of the time, and perhaps even more often given a small amount of error from the ambiguity of comparing different classes to each other. Judging by such a high percentage of similarity, we can conclude that the two datasets have classified land use in a very similar way.
Our comparison pie chart.
We also analyzed the most common discrepancies between the two datasets. Below is a table of our results.
Description (ESA-GLC)
Frequency
Cropland (Rainfed) - Grassland
41
Cropland (Rainfed) - Natural Vegetation
71

Not surprisingly, our most common error related to classifications of green land that could easily be mistaken by a land use generating program as either natural vegetation, grassland, or lush cropland.
That’s all for now!