A comparison of resampling methods for remote sensing classification and accuracy assessment

Full citation: 
Mitchell B. Lyons, David A. Keith, Stuart R. Phinn, Tanya J. Mason, Jane Elith (2018) A comparison of resampling methods for remote sensing classification and accuracy assessment. Remote Sensing of Environment 208:145-153.
Author/s associated with the CES: 
Mitchell Lyons
David Keith
Tanya Mason

Abstract: Maps that categorise the landscape into discrete units are a cornerstone of many scientific, management and conservation activities. The accuracy of these maps is often the primary piece of information used to make decisions about the mapping process or judge the quality of the final map. Variance is critical information when considering map accuracy, yet commonly reported accuracy metrics often do not provide that information. Various resampling frameworks have been proposed and shown to reconcile this issue, but have had limited uptake. In this paper, we compare the traditional approach of a single split of data into a training set (for classification) and test set (for accuracy assessment), to a resampling framework where the classification and accuracy assessment are repeated many times. Using a relatively simple vegetation mapping example and two common classifiers (maximum likelihood and random forest), we compare variance in mapped area estimates and accuracy assessment metrics (overall accuracy, kappa, user, producer, entropy, purity, quantity/allocation disagreement). Input field data points were repeatedly split into training and test sets via bootstrapping, Monte Carlo cross-validation (67:33 and 80:20 split ratios) and k-fold (5-fold) cross-validation. Additionally, within the cross-validation, we tested four designs: simple random, block hold-out, stratification by class, and stratification by both class and space. A classification was performed for every split of every methodological combination (100’s iterations each), creating sampling distributions for the mapped area of each class and the accuracy metrics. We found that regardless of resampling design, a single split of data into training and test sets results in a large variance in estimates of accuracy and mapped area. In the worst case, overall accuracy varied between ~40–80% in one resampling design, due only to random variation in partitioning into training and test sets. On the other hand, we found that all resampling procedures provided accurate estimates of error, and that they can also provide confidence intervals that are informative about the performance and uncertainty of the classifier. Importantly, we show that these confidence intervals commonly encompassed the magnitudes of increase or decrease in accuracy that are often cited in literature as justification for methodological or sampling design choices. We also show how a resampling approach enables generation of spatially continuous maps of classification uncertainty. Based on our results, we make recommendations about which resampling design to use and how it could be implemented. We also provide a fully worked mapping example, which includes traditional inference of uncertainty from the error matrix and provides examples for presenting the final map and its accuracy.

Go to top