Micro-database for sustainability (ESG) indicators developed at the Banco de España (2022)

Micro-database for sustainability (ESG) indicators developed at the Banco de España (2022)

Series: Statistical Notes. 17.

Author: Borja Fernández-Rosillo San Isidro, Eugenia Koblents Lapteva and Alejandro Morales Fernández.

Full document

Micro-database for sustainability (ESG) indicators developed at the Banco de España (2022) (4 MB)


In recent years, awareness of social and environmental issues has been on the rise. As a result, the demand for sustainability data has grown exponentially. This has driven the Banco de España’s Statistics Department to develop a micro-database for sustainability indicators (ESG).

This document presents two papers which analyse the process developed to gather this information and the numerous limitations and difficulties found along the way when dealing with sustainability microdata. Specifically, the two topics covered by these papers are:

  1. “Analysing climate change data gaps” (presented at the 11th Biennial IFC (Irving Fisher Committee) Conference on “Post-pandemic landscape for central bank statistics” held on 25-27 August 2022, Session 3.B “Environmental statistics”)
  2. “Creation of a structured sustainability database from company reports: A web application prototype for information retrieval and storage” (presented at the IFC Bank of Italy workshop on “Data science in central banking” held on 14-17 February 2022, Session 4.3 “Text Mining and ML utilized in Economic Research”) (Koblents and Morales (2022))

The first paper focuses on the various limitations encountered and achievements made in the process of developing a micro-database for sustainability indicators for non-financial companies. After carefully researching current ESG standards, consulting ESG experts, analysing regulatory obligations and conducting practical research, a list of the 39 most relevant ESG indicators was selected from those normally reported by companies. Currently more than 15,000 data samples have been gathered for the period 2019-2020 using a semi-automatic search application developed in-house (presented in detail in the second paper). Numerous challenges were identified during the process, such as the use of different metrics for reporting sustainability information, a lack of information and of a downloadable digital format, comparability difficulties and regulatory restrictions.

The second paper focuses on the tool developed to create the micro-database presented in the first paper. This web application aims, through semi-automatic extraction and storage, to retrieve sustainability indicators from annual non-financial statements reported by Spanish non-financial corporations. This application aims to make it easier for users to search for sustainability indicators in large document databases and store them in a structured database. The tool developed incorporates a set of pre-defined search terms for each indicator which have been selected based on domain knowledge and artificial intelligence in subsequent developments. For each company and indicator, the tool suggests the most relevant text snippets to the user, who then identifies the correct value of the indicator and stores it in the database using the web’s user interface. This tool was created by two data scientists in three months, with the continuous support of a team of experts that helped to define the system specifications, propose refinements, collect input data and validate and test the tool. This paper describes the technical approach and the main modules of the implemented prototype, which include text extraction, indexing and search, data storage and visualization.

Previous Business Sector Classificat... Next Notas estadísticas relativa...