Land surface model parameter optimisation using in situ flux data

Comparison of gradient-based versus random search algorithms (a case study using ORCHIDEE v1.9.5.2)

Vladislav Bastrikov*, Natasha Macbean, Cédric Bacour, Diego Santaren, Sylvain Kuppel, Philippe Peylin

*Corresponding author for this work

Research output: Contribution to journalArticle

1 Citation (Scopus)
4 Downloads (Pure)

Abstract

Land surface models (LSMs), which form the land component of earth system models, rely on numerous processes for describing carbon, water and energy budgets, often associated with highly uncertain parameters. Data assimilation (DA) is a useful approach for optimising the most critical parameters in order to improve model accuracy and refine future climate predictions. In this study, we compare two different DA methods for optimising the parameters of seven plant functional types (PFTs) of the ORCHIDEE LSM using daily averaged eddy-covariance observations of net ecosystem exchange and latent heat flux at 78 sites across the globe. We perform a technical investigation of two classes of minimisation methods-local gradient-based (the L-BFGS-B algorithm, limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm with bound constraints) and global random search (the genetic algorithm)-by evaluating their relative performance in terms of the model-data fit and the difference in retrieved parameter values. We examine the performance of each method for two cases: when optimising parameters at each site independently ("single-site" approach) and when simultaneously optimising the model at all sites for a given PFT using a common set of parameters ("multi-site" approach). We find that for the single site case the random search algorithm results in lower values of the cost function (i.e. lower model-data root mean square differences) than the gradient-based method; the difference between the two methods is smaller for the multi-site optimisation due to a smoothing of the cost function shape with a greater number of observations. The spread of the cost function, when performing the same tests with 16 random first-guess parameters, is much larger with the gradient-based method, due to the higher likelihood of being trapped in local minima. When using pseudo-observation tests, the genetic algorithm results in a closer approximation of the true posterior parameter value in the L-BFGS-B algorithm. We demonstrate the advantages and challenges of different DA techniques and provide some advice on using it for the LSM parameter optimisation.

Original languageEnglish
Pages (from-to)4739-4754
Number of pages16
JournalGeoscientific Model Development
Volume11
Issue number12
DOIs
Publication statusPublished - 30 Nov 2018

Fingerprint

Random Search
Parameter Optimization
Search Algorithm
land surface
Fluxes
Gradient
Data Assimilation
Cost Function
Cost functions
data assimilation
Data Model
Model
Pseudo-observations
genetic algorithm
Genetic Algorithm
Genetic algorithms
Bound Constraints
Globe
cost
Uncertain Parameters

ASJC Scopus subject areas

  • Modelling and Simulation
  • Earth and Planetary Sciences(all)

Cite this

Land surface model parameter optimisation using in situ flux data : Comparison of gradient-based versus random search algorithms (a case study using ORCHIDEE v1.9.5.2). / Bastrikov, Vladislav; Macbean, Natasha; Bacour, Cédric; Santaren, Diego; Kuppel, Sylvain; Peylin, Philippe.

In: Geoscientific Model Development, Vol. 11, No. 12, 30.11.2018, p. 4739-4754.

Research output: Contribution to journalArticle

Bastrikov, Vladislav ; Macbean, Natasha ; Bacour, Cédric ; Santaren, Diego ; Kuppel, Sylvain ; Peylin, Philippe. / Land surface model parameter optimisation using in situ flux data : Comparison of gradient-based versus random search algorithms (a case study using ORCHIDEE v1.9.5.2). In: Geoscientific Model Development. 2018 ; Vol. 11, No. 12. pp. 4739-4754.
@article{03123766d1bc42aa86d7e67d79a2c318,
title = "Land surface model parameter optimisation using in situ flux data: Comparison of gradient-based versus random search algorithms (a case study using ORCHIDEE v1.9.5.2)",
abstract = "Land surface models (LSMs), which form the land component of earth system models, rely on numerous processes for describing carbon, water and energy budgets, often associated with highly uncertain parameters. Data assimilation (DA) is a useful approach for optimising the most critical parameters in order to improve model accuracy and refine future climate predictions. In this study, we compare two different DA methods for optimising the parameters of seven plant functional types (PFTs) of the ORCHIDEE LSM using daily averaged eddy-covariance observations of net ecosystem exchange and latent heat flux at 78 sites across the globe. We perform a technical investigation of two classes of minimisation methods-local gradient-based (the L-BFGS-B algorithm, limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm with bound constraints) and global random search (the genetic algorithm)-by evaluating their relative performance in terms of the model-data fit and the difference in retrieved parameter values. We examine the performance of each method for two cases: when optimising parameters at each site independently ({"}single-site{"} approach) and when simultaneously optimising the model at all sites for a given PFT using a common set of parameters ({"}multi-site{"} approach). We find that for the single site case the random search algorithm results in lower values of the cost function (i.e. lower model-data root mean square differences) than the gradient-based method; the difference between the two methods is smaller for the multi-site optimisation due to a smoothing of the cost function shape with a greater number of observations. The spread of the cost function, when performing the same tests with 16 random first-guess parameters, is much larger with the gradient-based method, due to the higher likelihood of being trapped in local minima. When using pseudo-observation tests, the genetic algorithm results in a closer approximation of the true posterior parameter value in the L-BFGS-B algorithm. We demonstrate the advantages and challenges of different DA techniques and provide some advice on using it for the LSM parameter optimisation.",
author = "Vladislav Bastrikov and Natasha Macbean and C{\'e}dric Bacour and Diego Santaren and Sylvain Kuppel and Philippe Peylin",
note = "This work used eddy covariance data acquired by the FLUXNET community and in particular by the following networks: AmeriFlux (U.S. Department of Energy, Biological and Environmental Research, Terrestrial Carbon Program; DE-FG02-04ER63917 and DE-FG02-04ER63911), AfriFlux, AsiaFlux, CarboAfrica, CarboEuropeIP, CarboItaly, CarboMont, ChinaFlux, Fluxnet-Canada (supported by CFCAS, NSERC, BIOCAP, Environment Canada, and NRCan), GreenGrass, KoFlux, LBA, NECC, OzFlux, TCOS-Siberia and USCCC. We acknowledge the financial support to the eddy covariance data harmonisation provided by CarboEuropeIP, FAO-GTOS-TCO, iLEAPS, Max Planck Institute for Biogeochemistry, National Science Foundation, University of Tuscia, Universite{\`i} Laval, Environment Canada and US Department of Energy and the database development and technical support from Berkeley Water Center, Lawrence Berkeley National Laboratory, Microsoft Research eScience, Oak Ridge National Laboratory, University of California – Berkeley and the University of Virginia.",
year = "2018",
month = "11",
day = "30",
doi = "10.5194/gmd-11-4739-2018",
language = "English",
volume = "11",
pages = "4739--4754",
journal = "Geoscientific Model Development",
issn = "1991-959X",
publisher = "Copernicus Gesellschaft mbH",
number = "12",

}

TY - JOUR

T1 - Land surface model parameter optimisation using in situ flux data

T2 - Comparison of gradient-based versus random search algorithms (a case study using ORCHIDEE v1.9.5.2)

AU - Bastrikov, Vladislav

AU - Macbean, Natasha

AU - Bacour, Cédric

AU - Santaren, Diego

AU - Kuppel, Sylvain

AU - Peylin, Philippe

N1 - This work used eddy covariance data acquired by the FLUXNET community and in particular by the following networks: AmeriFlux (U.S. Department of Energy, Biological and Environmental Research, Terrestrial Carbon Program; DE-FG02-04ER63917 and DE-FG02-04ER63911), AfriFlux, AsiaFlux, CarboAfrica, CarboEuropeIP, CarboItaly, CarboMont, ChinaFlux, Fluxnet-Canada (supported by CFCAS, NSERC, BIOCAP, Environment Canada, and NRCan), GreenGrass, KoFlux, LBA, NECC, OzFlux, TCOS-Siberia and USCCC. We acknowledge the financial support to the eddy covariance data harmonisation provided by CarboEuropeIP, FAO-GTOS-TCO, iLEAPS, Max Planck Institute for Biogeochemistry, National Science Foundation, University of Tuscia, Universiteì Laval, Environment Canada and US Department of Energy and the database development and technical support from Berkeley Water Center, Lawrence Berkeley National Laboratory, Microsoft Research eScience, Oak Ridge National Laboratory, University of California – Berkeley and the University of Virginia.

PY - 2018/11/30

Y1 - 2018/11/30

N2 - Land surface models (LSMs), which form the land component of earth system models, rely on numerous processes for describing carbon, water and energy budgets, often associated with highly uncertain parameters. Data assimilation (DA) is a useful approach for optimising the most critical parameters in order to improve model accuracy and refine future climate predictions. In this study, we compare two different DA methods for optimising the parameters of seven plant functional types (PFTs) of the ORCHIDEE LSM using daily averaged eddy-covariance observations of net ecosystem exchange and latent heat flux at 78 sites across the globe. We perform a technical investigation of two classes of minimisation methods-local gradient-based (the L-BFGS-B algorithm, limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm with bound constraints) and global random search (the genetic algorithm)-by evaluating their relative performance in terms of the model-data fit and the difference in retrieved parameter values. We examine the performance of each method for two cases: when optimising parameters at each site independently ("single-site" approach) and when simultaneously optimising the model at all sites for a given PFT using a common set of parameters ("multi-site" approach). We find that for the single site case the random search algorithm results in lower values of the cost function (i.e. lower model-data root mean square differences) than the gradient-based method; the difference between the two methods is smaller for the multi-site optimisation due to a smoothing of the cost function shape with a greater number of observations. The spread of the cost function, when performing the same tests with 16 random first-guess parameters, is much larger with the gradient-based method, due to the higher likelihood of being trapped in local minima. When using pseudo-observation tests, the genetic algorithm results in a closer approximation of the true posterior parameter value in the L-BFGS-B algorithm. We demonstrate the advantages and challenges of different DA techniques and provide some advice on using it for the LSM parameter optimisation.

AB - Land surface models (LSMs), which form the land component of earth system models, rely on numerous processes for describing carbon, water and energy budgets, often associated with highly uncertain parameters. Data assimilation (DA) is a useful approach for optimising the most critical parameters in order to improve model accuracy and refine future climate predictions. In this study, we compare two different DA methods for optimising the parameters of seven plant functional types (PFTs) of the ORCHIDEE LSM using daily averaged eddy-covariance observations of net ecosystem exchange and latent heat flux at 78 sites across the globe. We perform a technical investigation of two classes of minimisation methods-local gradient-based (the L-BFGS-B algorithm, limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm with bound constraints) and global random search (the genetic algorithm)-by evaluating their relative performance in terms of the model-data fit and the difference in retrieved parameter values. We examine the performance of each method for two cases: when optimising parameters at each site independently ("single-site" approach) and when simultaneously optimising the model at all sites for a given PFT using a common set of parameters ("multi-site" approach). We find that for the single site case the random search algorithm results in lower values of the cost function (i.e. lower model-data root mean square differences) than the gradient-based method; the difference between the two methods is smaller for the multi-site optimisation due to a smoothing of the cost function shape with a greater number of observations. The spread of the cost function, when performing the same tests with 16 random first-guess parameters, is much larger with the gradient-based method, due to the higher likelihood of being trapped in local minima. When using pseudo-observation tests, the genetic algorithm results in a closer approximation of the true posterior parameter value in the L-BFGS-B algorithm. We demonstrate the advantages and challenges of different DA techniques and provide some advice on using it for the LSM parameter optimisation.

UR - http://www.scopus.com/inward/record.url?scp=85039871600&partnerID=8YFLogxK

U2 - 10.5194/gmd-11-4739-2018

DO - 10.5194/gmd-11-4739-2018

M3 - Article

VL - 11

SP - 4739

EP - 4754

JO - Geoscientific Model Development

JF - Geoscientific Model Development

SN - 1991-959X

IS - 12

ER -