The fatalities001 model
Overview of the methodology and data infrastructure behind the legacy conflict prediction model fatalities001.
fatalities001 was the first iteration of the fatalities
model. It was in use between 2022 and early 2023.
![Example predictions from the fatalities001 model](https://cdn.statically.io/img/i0.wp.com/viewsforecasting.org/wp-content/uploads/ForecastMapsPredictionMap_pgm_ensemble_standard_scale_r505_m508.png?resize=1673%2C1495&ssl=1)
Example forecasts from the fatalities001 model. Predicted fatalities in April 2022, based on data up to January 2022.
Please note that fatalities001 is a legacy model. The model codebase remains accessible via the open-source GitHub repository
FCDO_predicting_fatalities for transparency, but is no longer supported or maintained. Please consult the operational model documentation for up-to-date information about the live prediction system.
OVERVIEW OF THE PREDICTION MODEL
Scope and coverage
When used as the operational model in the VIEWS forecasting system, the model generated monthly predictions for the number of fatalities in impending conflict, as well as dichotomous forecasts for the probability of at least 25 battle-related
deaths (BRDs) per country-month and at least 1 BRD per PRIO-GRID-month.
Model family: fatalities
Model version: 001
Model version: 001
UPDATE SCHEDULE
Monthly
PREDICTED TYPE(S) OF VIOLENCE
State-based conflict
FORECASTING WINDOW
1-36 months ahead
PREDICTED OUTCOMES
Continuous & dichotomous predictions
COUNTRY-LEVEL COVERAGE
Global
SUB-NATIONAL COVERAGE
Africa and the Middle East
Model documentation
Predicting fatalities
Håvard Hegre, Forogh Akbari, Mihai Croicu, James Dale, Tim Gåsste, Remco Jansen, Peder Landsverk, Maxine Leis, Angelica Lindqvist-McGowan, Hannes Mueller, Malika Rakhmankulova, David Randahl, Christopher Rauh, Espen Geelmuyden Rød & Paola Vesco
Report, Uppsala University, 9 June 2022
INPUT DATA
The features informing the fatalities001 model
The fatalities001 model was informed by data on hundreds of variables from data providers such as the Uppsala Conflict Data Program (UCDP),
ACLED, PRIO-GRID, the World Bank, IMF, FAO, Mapsdam, SPEI and MIRCA. Based on these raw data variables, VIEWS also created a suite of additional variables by applying data transformations such as time and space lags, imputations to fill in for missing data, and other common data processing techniques.
Together, the raw- and processed data variables informing the various VIEWS models are referred to as
features, which are grouped into feature sets
based on the overall theme they relate to and/or the data provider(s) from which they are derived.
The feature sets that informed the sub-models and model ensembles in the fatalities001
model are documented in the GitHub repository
FCDO_predicing_fatalities.
Why categorize data into feature sets?
Categorizing input data variables into feature sets is part of the standard data organization routines in VIEWS, which greatly facilitates model development. Amongst other benefits, it allows us
to call upon a pre-determined set of features, which is maintained in a single location, when training our models. This minimizes the risk of human error when compiling the input datasets and greatly
facilitates maintenance of the model documentation.
Key themes of conflict drivers
The feature sets that informed the fatalities001 model can be divided into eight overall themes of conflict drivers, presented below. Please
note that the actual number of feature sets used by the model exceeded this number; the categorisation below is offered for illustration purposes only.
Conflict history
A suite of features capturing the history of conflict in each country and sub-national grid cell, e.g. the number of battle-related deaths per unit and level of analysis, and measures
of the temporal and spatial distance to recent conflict events.
Data providers: Uppsala Conflict Data Program (UCDP), The Armed Conflict Location & Event Data Project (ACLED).
Political institutions, democracy
Features that capture democracy indices and the strength of political institutions in each country, such as liberal
democracy, rule of law, equality, and the level of exclusion of social groups in politics.
Data providers: Varieties of Democracy (V-Dem)
Development
Measures of development as provided by the World Bank Indicators, e.g. GDP per capita, infant mortality rate, and school enrollment.
Data providers: The World Bank (Word Development Indicators, WDI)
Economic growth
A feature set focusing specifically on historic and future economic growth, e.g. real GDP growth per year and growth forecasts for the coming years.
Data providers: The International Monetary Fund World Economic Outlook (IMF WEO)
Climate & societal vulnerability
Feature sets capturing climate extremes and societal vulnerability to climate hazards and other external shocks, e.g. climate extreme indices, reliance on agriculture, crop yields, precipitation, freshwater withdrawal, water management efficiency, and access to renewable resources.
Data providers: United Nations Food and Agriculture Organisation (FAO), FAO AQUASTAT, PRIO-GRID,
MIRCA, MAPSPAM, SPEI Global Drought Monitor
News monitoring
A feature set based on the Mueller & Rauh (2018) topic model, which captures conflict risks as drawn from a topic analysis of news media.
Data providers:
Mueller & Rauh (2018)
Natural and social geography
A feature set capturing terrain type, distance to natural resources, demography, proximity to cities and country borders.
Data providers: PRIO-GRID
Food security and access to basic needs
Feature sets capturing staple food prices along with measures of food security and access to basic human needs, such as mean food prices, food
price inflation, undernourishment, access to clean water, and basic sanitation.
Data providers: United Nations Food and Agriculture Organisation (FAO), FAOSTAT
GENERATING THE FORECASTS
Model training procedures
Step
1
Sub-models: combinations of feature sets and algorithms
As a first step when training the fatalities001
model, each feature set was paired with an advanced machine learning algorithm.
The fatalities001
model employed four such algorithms, which you can read more about in our technical reports on the model: random forests, gradient boosting, markov models, and hurdle models.
The result was a series of sub-models, or constituent models are they are more commonly called, that used patterns in their respective subsets of historic data to generate predictions for future conflict.
Step
2
Ensemble models: groups of sub-models using “the wisdom of the crowd”
Much like a crowd tends to be wiser than the individuals composing it, prediction models that are informed by a number of smaller and specialized sub-models are known to be more robust and generate stronger predictions than single models.
As a second step in the model training procedures, the sub-models above were therefore combined into two groups or ensembles of models – one ensemble for each level of analysis.
Two different ensembling techniques were used for this purpose:
- The country-level ensemble model combined the predictions from each of the sub-models using a genetic algorithm that assigned different weights to the contribution from each model in order to maximise predictive performance.
- The sub-national ensemble model, in turn, used a simple unweighted average of the sub-model results.
The ensembling techniques above are motivated and described at length in the technical report on the
fatalities001 model.
Data infrastructure
VIEWS3 and viewser: the data infrastructure supporting the fatalities001 model
The fatalities001 model was built in a brand new sophisticated data infrastructure called
VIEWS3 – the third iteration of the back-end system and database supporting the VIEWS prediction models. Advanced users can interact with
the VIEWS3 system using a web-based CLI called viewser.
VIEWS3 and viewser are documented in a suite of open-source GitHub repositories that users are welcome to consult for detailed information.