Artificial intelligence and data assimilation: A successful marriage for Earth system research.


Artificial intelligence methods are playing an increasingly important role in Earth system research. Applications range from advanced regression and clustering tasks to model emulation, operator inversion, and hybrids of the aforementioned. However, machine learning is not restricted to isolated use cases. Machine learning can be also installed as an extension to aid and improve existing traditional methods.

One such hybrid is the combination of data assimilation and machine learning. Data assimilation, the optimal combination of a numerical model and real-world observations, has become an indispensable tool in Earth system modelling. Data assimilation strongly depends on accurate estimates of the uncertainties of the model forecasts and the observations. However, such uncertainties often are costly to obtain in terms of campaign money or computational power. As a result, the error estimates are simplified for the ease of use, e.g., assumed static in space or time.

In our recent machine learning study published in JAMES, we trained an artificial neural network to estimate uncertainties of given geophysical quantities. By using the example of wind velocities over the ocean, we demonstrate that a neural network can learn to predict uncertainties of different wind regimes after training the network with an ensemble of atmospheric reanalyses (from ECMWF, NCEP and JMA). After the training, the network needs only a single ensemble member as input to generate correct dynamic error estimates, i.e., the ensemble spread. Needless to say, that after the training period a huge amount of campaign money and/or computational resources, e.g., CPU-h and storage, can be saved. As a further advantage, the network shows a very good capacity to generalize the knowledge obtained from localized training to global data.

In practice, such a trained neural network can be used for various applications in the ESM context, for instance, a simple and cost-efficient generation and storing of ensemble data, a dynamic estimation of error covariances as needed in data assimilation where dynamic error propagation is not implemented or too costly (e.g., in Ensemble Kalman Filters or Particle Filters).


Irrgang, C., Saynisch‐Wagner, J., & Thomas, M. ( 2020). Machine learning‐based prediction of spatio‐temporal uncertainties in global wind velocity reanalyses. Journal of Advances in Modeling Earth Systems, 12, e2019MS001876.