Climatebert.ai is a joint research project of Julia Anna Bingler from ETH Zürich, Mathias Kraus and Nicolas Webersinke from FAU Erlangen-Nürnberg, and Markus Leippold from University of Zürich. For more information about the authors and their background, see Authors.

Our project started in 2019 with the overall aim to make climate-related unstructured textual information from various sources available for research, policy-making, financial supervisory authorities, and financial analysts. As a first step, we trained the model to analyze climate-related disclosures of companies (see our working paper on SSRN). Ever since then, ClimateBERT has constantly evolved.

Language Model

ClimateBERT is the name of our transformer-based language model adapted for use for climate-related text and has been fine-tuned on various downstream tasks.

See Language Model for more information on our language model.

Downstream Tasks

So far, ClimateBERT has been fine-tuned on six downstream tasks. It is able to

  1. detect climate content in text files,
  2. assess the sentiment of this content,
  3. fact-check climate-related claims,
  4. assign a climate disclosure category to the climate-related content based on the four categories of the recommendations of the Task Force on Climate-related Financial Disclosures (TCFD),
  5. identify whether climate-related content is a commitment for climate action, and
  6. to assess whether climate-related content is rather specific or unspecific boilerplate language.

The additional downstream tasks that ClimateBERT has been trained on since our first steps could serve various use cases. For example, it could aid financial supervisors in assessing the state of corporate climate risk disclosures. Or it could support governments in their recent activities to detect corporate greenwashing activities. Financial analysts might use ClimateBERT to identify the climate risk and opportunities that a company.

Carbon Footprint

Training deep neural networks in general and large language models, in particular, has a significant carbon footprint already today. If the LM research trends continue, this detrimental climate impact will increase considerably. We acknowledge that our work is part of this trend. To see how we address this sensitive topic in detail, see the respective section on carbon footprint in our research papers.

In general, we would have liked to train and run our models on servers powered by renewable energy. This first best option was unfortunately not available. In order to speed up the energy system transformation required to achieve the global climate targets, we contribute our part by donating Euro 100 to atmosfair. We explicitly refrain from calling this donation a CO2 compensation, and we refrain from a solution based on afforestation. See the appendix of our language model research paper for a more detailed statement on the matter.