The fatalities001 model

Overview of the methodology and data infrastructure behind the legacy conflict prediction model fatalities001.

fatalities001 was the first iteration of the fatalities model. It was in use between 2022 and early 2023.

Scope and coverage Input data Model training Data infrastructure

Example predictions from the fatalities001 model

Example forecasts from the fatalities001 model. Predicted fatalities in April 2022, based on data up to January 2022.

This is a legacy model

Please note that fatalities001 is a legacy model. The model codebase remains accessible via the open-source GitHub repository FCDO_predicting_fatalities for transparency, but is no longer supported or maintained. Please consult the operational model documentation for up-to-date information about the live prediction system.

OVERVIEW OF THE PREDICTION MODEL

Scope and coverage

Source code

When used as the operational model in the VIEWS forecasting system, the model generated monthly predictions for the number of fatalities in impending conflict, as well as dichotomous forecasts for the probability of at least 25 battle-related deaths (BRDs) per country-month and at least 1 BRD per PRIO-GRID-month.

Model family: fatalities
Model version: 001

Hover over the boxes for more information.

UPDATE SCHEDULE

Monthly

PREDICTED TYPE(S) OF VIOLENCE

State-based conflict

FORECASTING WINDOW

1-36 months ahead

PREDICTED OUTCOMES

Continuous & dichotomous predictions

COUNTRY-LEVEL COVERAGE

Global

SUB-NATIONAL COVERAGE

Africa and the Middle East

Model documentation

Predicting fatalities

Håvard Hegre, Forogh Akbari, Mihai Croicu, James Dale, Tim Gåsste, Remco Jansen, Peder Landsverk, Maxine Leis, Angelica Lindqvist-McGowan, Hannes Mueller, Malika Rakhmankulova, David Randahl, Christopher Rauh, Espen Geelmuyden Rød & Paola Vesco Report, Uppsala University, 9 June 2022

Report Source code

INPUT DATA

The features informing the fatalities001 model

The fatalities001 model was informed by data on hundreds of variables from data providers such as the Uppsala Conflict Data Program (UCDP), ACLED, PRIO-GRID, the World Bank, IMF, FAO, Mapsdam, SPEI and MIRCA. Based on these raw data variables, VIEWS also created a suite of additional variables by applying data transformations such as time and space lags, imputations to fill in for missing data, and other common data processing techniques.

Together, the raw- and processed data variables informing the various VIEWS models are referred to as features, which are grouped into feature sets based on the overall theme they relate to and/or the data provider(s) from which they are derived.

The feature sets that informed the sub-models and model ensembles in the fatalities001 model are documented in the GitHub repository FCDO_predicing_fatalities.

Why categorize data into feature sets?

Categorizing input data variables into feature sets is part of the standard data organization routines in VIEWS, which greatly facilitates model development. Amongst other benefits, it allows us to call upon a pre-determined set of features, which is maintained in a single location, when training our models. This minimizes the risk of human error when compiling the input datasets and greatly facilitates maintenance of the model documentation.

Key themes of conflict drivers

The feature sets that informed the fatalities001 model can be divided into eight overall themes of conflict drivers, presented below. Please note that the actual number of feature sets used by the model exceeded this number; the categorisation below is offered for illustration purposes only.

Conflict history

Country level Grid level

A suite of features capturing the history of conflict in each country and sub-national grid cell, e.g. the number of battle-related deaths per unit and level of analysis, and measures of the temporal and spatial distance to recent conflict events.

Data providers: Uppsala Conflict Data Program (UCDP), The Armed Conflict Location & Event Data Project (ACLED).

Political institutions, democracy

Country level

Features that capture democracy indices and the strength of political institutions in each country, such as liberal democracy, rule of law, equality, and the level of exclusion of social groups in politics.

Data providers: Varieties of Democracy (V-Dem)

Development

Country level Grid level

Measures of development as provided by the World Bank Indicators, e.g. GDP per capita, infant mortality rate, and school enrollment.

Data providers: The World Bank (Word Development Indicators, WDI)

Economic growth

Country level

A feature set focusing specifically on historic and future economic growth, e.g. real GDP growth per year and growth forecasts for the coming years.

Data providers: The International Monetary Fund World Economic Outlook (IMF WEO)

Climate & societal vulnerability

Country level Grid level

Feature sets capturing climate extremes and societal vulnerability to climate hazards and other external shocks, e.g. climate extreme indices, reliance on agriculture, crop yields, precipitation, freshwater withdrawal, water management efficiency, and access to renewable resources.

Data providers: United Nations Food and Agriculture Organisation (FAO), FAO AQUASTAT, PRIO-GRID, MIRCA, MAPSPAM, SPEI Global Drought Monitor

News monitoring

Country level

A feature set based on the Mueller & Rauh (2018) topic model, which captures conflict risks as drawn from a topic analysis of news media.

Data providers: Mueller & Rauh (2018)

Natural and social geography

Grid level

A feature set capturing terrain type, distance to natural resources, demography, proximity to cities and country borders.

Data providers: PRIO-GRID

Food security and access to basic needs

Country level Grid level

Feature sets capturing staple food prices along with measures of food security and access to basic human needs, such as mean food prices, food price inflation, undernourishment, access to clean water, and basic sanitation.

Data providers: United Nations Food and Agriculture Organisation (FAO), FAOSTAT

GENERATING THE FORECASTS

Model training procedures

Technical report

Step

1

Sub-models: combinations of feature sets and algorithms

As a first step when training the fatalities001 model, each feature set was paired with an advanced machine learning algorithm.

The fatalities001 model employed four such algorithms, which you can read more about in our technical reports on the model: random forests, gradient boosting, markov models, and hurdle models.

The result was a series of sub-models, or constituent models are they are more commonly called, that used patterns in their respective subsets of historic data to generate predictions for future conflict.

Step

2

Ensemble models: groups of sub-models using “the wisdom of the crowd”

Much like a crowd tends to be wiser than the individuals composing it, prediction models that are informed by a number of smaller and specialized sub-models are known to be more robust and generate stronger predictions than single models.

As a second step in the model training procedures, the sub-models above were therefore combined into two groups or ensembles of models – one ensemble for each level of analysis.

Two different ensembling techniques were used for this purpose:

The country-level ensemble model combined the predictions from each of the sub-models using a genetic algorithm that assigned different weights to the contribution from each model in order to maximise predictive performance.
The sub-national ensemble model, in turn, used a simple unweighted average of the sub-model results.

The ensembling techniques above are motivated and described at length in the technical report on the fatalities001 model.

Data infrastructure

VIEWS3 and viewser: the data infrastructure supporting the fatalities001 model

The fatalities001 model was built in a brand new sophisticated data infrastructure called VIEWS3 – the third iteration of the back-end system and database supporting the VIEWS prediction models. Advanced users can interact with the VIEWS3 system using a web-based CLI called viewser. VIEWS3 and viewser are documented in a suite of open-source GitHub repositories that users are welcome to consult for detailed information.

VIEWS3 viewser Browse all code

The fatalities001 model

OVERVIEW OF THE PREDICTION MODEL

Scope and coverage

UPDATE SCHEDULE

PREDICTED TYPE(S) OF VIOLENCE

FORECASTING WINDOW

PREDICTED OUTCOMES

COUNTRY-LEVEL COVERAGE

SUB-NATIONAL COVERAGE

Model documentation

Predicting fatalities

INPUT DATA

The features informing the fatalities001 model

Key themes of conflict drivers

Conflict history

Political institutions, democracy

Development

Economic growth

Climate & societal vulnerability

News monitoring

Natural and social geography

Food security and access to basic needs

GENERATING THE FORECASTS

Model training procedures

Sub-models: combinations of feature sets and algorithms

Ensemble models: groups of sub-models using “the wisdom of the crowd”

Data infrastructure

VIEWS3 and viewser: the data infrastructure supporting the fatalities001 model

Related posts

VIEWS at #Data4Peace conference

“Emerging Best Practices for Predictive Model Development and Use” – VIEWS panel at the NYU CIC Conflict Early Warning/Early Action workshop in May 2021

Lightning Talk on VIEWS at the “Emerging Technologies in Peacebuilding and Prevention” Virtual Practitioners Workshop 2021