The fatalities001 model

Overview of the methodology and data infrastructure behind the legacy conflict prediction model fatalities001.
fatalities001 was the first iteration of the fatalities  model. It was in use between 2022 and early 2023. 
Example predictions from the fatalities001 model
Example forecasts from the fatalities001 model. Predicted fatalities in April 2022, based on data up to January 2022.
  • This is a legacy model
Please note that fatalities001 is a legacy model. The model codebase remains accessible via the open-source GitHub repository FCDO_predicting_fatalities for transparency, but is no longer supported or maintained. Please consult the operational model documentation for up-to-date information about the live prediction system.  

OVERVIEW OF THE PREDICTION MODEL

Scope and coverage

When used as the operational model in the VIEWS forecasting system, the model generated monthly predictions for the number of fatalities in impending conflict, as well as dichotomous forecasts for the probability of at least 25 battle-related deaths (BRDs) per country-month and at least 1 BRD per PRIO-GRID-month. 
Model family: fatalities
Model version:  001
  • Hover over the boxes for more information.

UPDATE SCHEDULE

Monthly

PREDICTED TYPE(S) OF VIOLENCE

State-based conflict

FORECASTING WINDOW

1-36 months ahead

PREDICTED OUTCOMES

Continuous & dichotomous predictions

COUNTRY-LEVEL COVERAGE

Global

SUB-NATIONAL COVERAGE

Africa and the Middle East

Model documentation

Predicting fatalities

Håvard Hegre, Forogh Akbari, Mihai Croicu, James Dale, Tim Gåsste, Remco Jansen, Peder Landsverk, Maxine Leis, Angelica Lindqvist-McGowan, Hannes Mueller, Malika Rakhmankulova, David Randahl, Christopher Rauh, Espen Geelmuyden Rød & Paola Vesco Report, Uppsala University, 9 June 2022

INPUT DATA

The features informing the fatalities001 model

The fatalities001 model was informed by data on hundreds of variables from data providers such as the Uppsala Conflict Data Program (UCDP), ACLED, PRIO-GRID, the World Bank, IMF, FAO, Mapsdam, SPEI and MIRCA. Based on these raw data variables, VIEWS also created a suite of additional variables by applying data transformations such as time and space lags, imputations to fill in for missing data, and other common data processing techniques.
Together, the raw- and processed data variables informing the various VIEWS models are referred to as  features, which are grouped into feature sets based on the overall theme they relate to and/or the data provider(s) from which they are derived.
The feature sets that informed the sub-models and model ensembles in the fatalities001  model are documented in the GitHub repository FCDO_predicing_fatalities.
Why categorize data into feature sets?
Categorizing input data variables into feature sets is part of the standard data organization routines in VIEWS, which greatly facilitates model development. Amongst other benefits, it allows us to call upon a pre-determined set of features, which is maintained in a single location, when training our models. This minimizes the risk of human error when compiling the input datasets and greatly facilitates maintenance of the model documentation. 

Key themes of conflict drivers

The feature sets that informed the fatalities001 model can be divided into eight overall themes of conflict drivers, presented below. Please note that the actual number of feature sets used by the model exceeded this number; the categorisation below is offered for illustration purposes only. 
Conflict history
A suite of features capturing the history of conflict in each country and sub-national grid cell, e.g. the number of battle-related deaths per unit and level of analysis, and measures of the temporal and spatial distance to recent conflict events. 
Data providers: Uppsala Conflict Data Program (UCDP), The Armed Conflict Location & Event Data Project (ACLED).
Political institutions, democracy
Features that capture democracy indices and the strength of political institutions in each country, such as liberal democracy, rule of law, equality, and the level of exclusion of social groups in politics. 
Data providers: Varieties of Democracy (V-Dem)
Measures of development as provided by the World Bank Indicators, e.g. GDP per capita, infant mortality rate, and school enrollment. 
Data providers: The World Bank (Word Development Indicators, WDI)
Economic growth
A feature set focusing specifically on historic and future economic growth, e.g. real GDP growth per year and growth forecasts for the coming years. 
Data providers: The International Monetary Fund World Economic Outlook (IMF WEO)
Climate & societal vulnerability
Feature sets capturing climate extremes and societal vulnerability to climate hazards and other external shocks, e.g. climate extreme indices, reliance on agriculture, crop yields, precipitation, freshwater withdrawal, water management efficiency, and access to renewable resources. 
Data providers: United Nations Food and Agriculture Organisation (FAO), FAO AQUASTAT, PRIO-GRID, MIRCA, MAPSPAM, SPEI Global Drought Monitor
News monitoring
A feature set based on the Mueller & Rauh (2018) topic model, which captures conflict risks as drawn from a topic analysis of news media. 
Data providers:  Mueller & Rauh (2018)
Natural and social geography
A feature set capturing terrain type, distance to natural resources, demography, proximity to cities and country borders.
Data providers: PRIO-GRID
Food security and access to basic needs
Feature sets capturing staple food prices along with measures of food security and access to basic human needs, such as mean food prices, food price inflation, undernourishment, access to clean water, and basic sanitation. 
Data providers: United Nations Food and Agriculture Organisation (FAO), FAOSTAT

GENERATING THE FORECASTS

Model training procedures

Step
1

Sub-models: combinations of feature sets and algorithms

As a first step when training the fatalities001 model, each feature set was paired with an advanced machine learning algorithm.
The fatalities001  model employed four such algorithms, which you can read more about in our technical reports on the model: random forests, gradient boosting, markov models, and hurdle models. 
The result was a series of sub-models, or constituent models are they are more commonly called, that used patterns in their respective subsets of historic data to generate predictions for future conflict.
Step
2

Ensemble models: groups of sub-models using “the wisdom of the crowd”

Much like a crowd tends to be wiser than the individuals composing it, prediction models that are informed by a number of smaller and specialized sub-models are known to be more robust and generate stronger predictions than single models.
As a second step in the model training procedures, the sub-models above were therefore combined into two groups or ensembles of models – one ensemble for each level of analysis. 
Two different ensembling techniques were used for this purpose:
  • The country-level ensemble model combined the predictions from each of the sub-models using a genetic algorithm that assigned different weights to the contribution from each model in order to maximise predictive performance.

  • The sub-national ensemble model, in turn, used a simple unweighted average of the sub-model results.
The ensembling techniques above are motivated and described at length in the technical report on the  fatalities001 model. 

Data infrastructure

VIEWS3 and viewser: the data infrastructure supporting the fatalities001 model

The fatalities001 model was built in a brand new sophisticated data infrastructure called VIEWS3 – the third iteration of the back-end system and database supporting the VIEWS prediction models. Advanced users can interact with the VIEWS3 system using a web-based CLI called viewserVIEWS3 and viewser are documented in a suite of open-source GitHub repositories that users are welcome to consult for detailed information.