## Abstract

The current study aimed to determine the spatial transferability of eXtreme Gradient Boosting (XGBoost) models for estimating biophysical and biochemical variables (BVs), using Sentinel-2 data. The specific objectives were to: (1) assess the effect of different proportions of training samples (i.e., 25%, 50%, and 75%) available at the Target site ((Formula presented.)) on the spatial transferability of the XGBoost models and (2) evaluate the effect of the Source site ((Formula presented.)) (i.e., trained) model accuracy on the Target site (i.e., unseen) retrieval uncertainty. The results showed that the Bothaville ((Formula presented.)) → Harrismith ((Formula presented.)) Leaf Area Index (LAI) models required only fewer proportions, i.e., 25% or 50%, of the training samples to make optimal retrievals in the (Formula presented.) (i.e., RMSE: 0.61 m^{2} m^{−2}; R^{2}: 59%), while Harrismith ((Formula presented.)) →Bothaville ((Formula presented.)) LAI models required up to 75% of training samples in the (Formula presented.) to obtain optimal LAI retrievals (i.e., RMSE = 0.63 m^{2} m^{−2}; R^{2} = 67%). In contrast, the chlorophyll content models for Bothaville ((Formula presented.)) → Harrismith ((Formula presented.)) required significant proportions of samples (i.e., 75%) from the (Formula presented.) to make optimal retrievals of Leaf Chlorophyll Content (LC_{ab}) (i.e., RMSE: 7.09 µg cm^{−2}; R^{2}: 58%) and Canopy Chlorophyll Content (CCC) (i.e., RMSE: 36.3 µg cm^{−2}; R^{2}: 61%), while Harrismith ((Formula presented.)) →Bothaville ((Formula presented.)) models required only 25% of the samples to achieve RMSEs of 8.16 µg cm^{−2} (R^{2}: 83%) and 40.25 µg cm^{−2} (R^{2}: 77%), for LC_{ab} and CCC, respectively. The results also showed that the source site model accuracy led to better transferability for LAI retrievals. In contrast, the accuracy of LC_{ab} and CCC source site models did not necessarily improve their transferability. Overall, the results elucidate the potential of transferable Machine Learning Regression Algorithms and are significant for the rapid retrieval of important crop BVs in data-scarce areas, thus facilitating spatially-explicit information for site-specific farm management.

Original language | English |
---|---|

Article number | 3968 |

Journal | Remote Sensing |

Volume | 14 |

Issue number | 16 |

DOIs | |

Publication status | Published - Aug 2022 |

Externally published | Yes |

## Keywords

- chlorophyll content
- eXtreme Gradient Boosting Bothaville (D) → Harrismith (D)
- leaf area index
- machine learning
- precision agriculture
- Sentinel-2
- spatial transferability

## ASJC Scopus subject areas

- General Earth and Planetary Sciences