The ViEWS2020 model

Overview of the methodology and data infrastructure behind the legacy conflict prediction model ViEWS2020the second operational conflict prediction model employed by the VIEWS system. It was in use between 2020 and 2021. 
Example prediction plot from the ViEWS2020 model
Example forecasts from the ViEWS2020 model. Predicted probability of observing at least 1 battle-related death from state-based conflict in February 2022, based on data up to October 2021.
  • This is a legacy model
Please note that ViEWS2020 is a legacy model. The model codebase remains accessible via the open-source GitHub repository OpenViEWS2 for transparency, but neither the model nor its underlying data infrastructure ViEWS2 is supported or maintained. Please consult the operational model documentation for up-to-date information about the live prediction system. 

OVERVIEW OF THE PREDICTION MODEL

The ViEWS2020 model

The ViEWS2020  model generated monthly probabilistic assessments of the likelihood that pre-determined fatality thresholds would be met or exceeded for each of three different conflict categories: state-based conflict, non-state conflict, and one-sided violence. For each category, it predicted the probability that at least 25 battle-related deaths would occur in a given country and month over a rolling three-year window, and the predicted risk that at least one such fatality would occur per 0.5×0.5 decimal degree PRIO-GRID cell (approximately 55x55km each) and month.
  • Hover over the boxes for more information.

UPDATE SCHEDULE

Monthly

PREDICTED TYPE(S) OF VIOLENCE

State-based conflict, non-state conflict, and one-sided violence

FORECASTING WINDOW

1-36 months ahead

PREDICTED OUTCOMES

Probability of exceeding pre-determined fatality thresholds

COUNTRY-LEVEL COVERAGE

Africa

SUB-NATIONAL COVERAGE

Africa

Forecasting procedure

Identifying and weighting the predictors of conflict

Building the constituent models

The ViEWS2020 forecasts were produced by advanced models that compiled, analysed, and evaluated historical time-series data from 1989 up to the month prior to each run of the model (each update of the forecasts). The data covered a multitude of variables that decades of peace research had shown to correlate with political violence, or – conversely – with the lack thereof.

Variables that shared a common theme, such as conflict history or different measures of the strength of political institutions, were grouped together into so-called “constituent models”, which were trained and fitted independently.

Within the constituent models, each theme of variables was fed into a number of so-called random forest algorithms – machine learning algorithms that learn from historical observations in order to generate forecasts for the future. The algorithms used a subset of the available data to identify predictors that performed particularly well in predicting conflict for a later subset of the same data. It repeated this multiple times, generating a list of the predictors in each theme that perform well over and over again – even taking into account the prevalence of non-linear relationships and interactive effects amongst the pool of predictors. Along with a calibration procedure, the result from this exercise was used to determine the relative weight that was placed on each variable when generating the constituent model forecasts, and (if needed) to weed out variables that have no bearing on the results.
1
1

Forecasting violence with the “wisdom of the crowds”

Compiling the model ensembles

Once the thematic constituent models had been trained and fitted, they were combined into broader models known as “ensembles” – a key tenet of all VIEWS models. Much like a crowd is wiser than the single individuals composing it, broader models that make use of forecasts from a number of smaller and specialized models are known to generate more accurate predictions. In addition to the benefits of incorporating multiple themes of conflict predictors and thus becoming more comprehensive forecasting models, ensembles are less sensitive to overfitting and more robust to new data.

The
ViEWS2020 forecasts were generated by means of two such model ensembles: one that incorporated forecasts from constituent models trained specifically to predict conflict at the country level, and one that was trained for geographically refined locations spanning approximately 55x55km each (0.5×0.5 degrees). Both ensembles used calendar months as the temporal unit of analysis. They were known as the country-month ( cm) ensemble and the PRIO-GRID-month (pgm) ensembles and each contained a list of constituent models that were interpretable on their own and that had shown to improve the predictive performance of either one of the two ensembles. 16 models met these criteria for the cm ensemble, and 12 for the pgm ensemble. An overview of these is presented in the model section below, described in depth in the ViEWS2020 Special Data Feature in Journal of Peace Research.

Estimating the model weights

Similar to the evaluation procedure that the individual conflict predictors were subjected to in order to single out the most important variables in the constituent model forecasts, also the constituent models themselves underwent a weighting procedure upon incorporation into the final ensembles.

Up until February 2020, when the
ViEWS2020 model was launched, simple unweighted model averaging emerged as the preferred weighting solution for both levels of analysis, as this method produced similar results to more complex weighting alternatives. This meant that the final ensemble forecasts were estimated as a simple average of the forecasts generated by each of the included constituent models. Following the launch of the ViEWS2 data infrastructure, which provided more data for model weighting, we however shifted to Ensemble Bayesian Model Averaging (EBMA) for the country-month ( cm) level. EBMA allows for inclusion of more models that specialize for subsets of the data, in addition to broader ones, resulting in more accurate forecasts. At the geographic ( pgm) level, unweighted model averaging however continued to be used, since the EMBA procedure did not improve the performance of forecasting system enough to justify a change.

The two procedures above are discussed at length in the
ViEWS2020  Special Data Feature in Journal of Peace Research. Additional information is also found in Appendix D to that article, available on our publications page. 
1
1

Computing the forecasts

To compute the forecasts, the ViEWS2020 model made use of two strategies: dynamic simulation (ds) and one-step-ahead modeling. The former built on the procedures discussed in Hegre et al. (2013) and Hegre et al. (2016), where it is discussed at length. Both strategies are also discussed in ViEWS’ ViEWS2020  Special Data Feature in Journal of Peace Research and its appendices.

the 2021 Special Data Feature

ViEWS2020: Revising and evaluating the ViEWS political Violence Early-Warning System

Håvard Hegre, Curtis Bell, Michael Colaresi, Mihai Croicu, Frederick Hoyles, Remco Jansen, Maxine Ria Leis, Angelica Lindqvist-McGowan, David Randahl, Espen Geelmuyden Rød, and Paola Vesco Journal of Peace Research, Vol 58, Issue 3, 2021
Abstract
This article presents an update to the ViEWS political Violence Early-Warning System. This update introduces (1) a new infrastructure for training, evaluating, and weighting models that allows us to more optimally combine constituent models into ensembles, and (2) a number of new forecasting models that contribute to improve overall performance, in particular with respect to effectively classifying high- and low-risk cases. Our improved evaluation procedures allow us to develop models that specialize in either the immediate or the more distant future. We also present a formal, ‘retrospective’ evaluation of how well ViEWS has done since we started publishing our forecasts from July 2018 up to December 2019. Our metrics show that ViEWS is performing well when compared to previous out-of-sample forecasts for the 2015–17 period. Finally, we present our new forecasts for the January 2020–December 2022 period. We continue to predict a near-constant situation of conflict in Nigeria, Somalia, and DRC, but see some signs of decreased risk in Cameroon and Mozambique.

The ViEWS2020 model

The country-level model ensemble

The model ensemble trained to predict conflict at the country level consisted of 16 smaller forecasting models (sub-models) specialised at addressing the prediction problem from different perspectives. They have been grouped into five different themes below for illustration purposes. 

For additional model specifications, please see the model and feature lists in our source code on GitHub, or the text-based model descriptions with feature importances in the online appendix to our 2021 Special Data Feature in Journal of Peace Research. 

Peace and security

Models informed by numerous measures of conflict and protest history, with data sourced from UCDP, ACLED, and the International Crisis Group. 
All models (9)
The UCDP conflict history model (cflong)
Features capturing different aspects of conflict history per country, as defined and sourced from the UCDP, including time since the last fatal event, which type of violence occurred, and which fatality thresholds were reached (at least 1, 25, 100, or 500 deaths).  

The 25 BRDs onset model
 (onset_24_25_all)
A model trained to predict onset of conflict, as recorded by the UCDP. Onset is defined as the first month that a country reaches or exceeds 25 battle-related deaths (BRDs) over a rolling 24-month window. The model captures all features informing the country-level models.

The country dummy model 
(cdummies
A model consisting of dummy variables based on a random forest variant of a random effects model (a type of regression model that assumes that the intercepts and/or some of the explanatory variables are random). 

The neighbour history model 
(neibhist
A model capturing the conflict history in neighbouring countries using a subset of the features from the cflong model. Sourced from UCDP. 

The dynamic simulation models
 ( ds_25; ds_dummy
Conflict history models sourced from UCDP that make use of dynamic simulations to generate predictions; one trained on the incidence of conflict with at least one battle-related death (BRD), one using the incidence of at least 25 BRDs, and one using the incidence of 500+ BRDs in a given month from state-based, non-state and one-sided violence together. Sourced from UCDP.  

The ACLED violence model
 (acled_violence)
Variables capturing the recent history of political violence as defined by the UCDP, sourced from ACLED. 

The ACLED protest model
 (acled_protest)
Variables capturing the recent history of protests in each country, sourced from the ACLED dataset.  

The ICG Crisis Watch model
 (icgcw
A model informed by monthly warnings issued by the International Crisis Group’s Crisis Watch. 

Governance

Models capturing the strength of political institutions coupled with comprehensive assessments of levels of democracy. Data is sourced from REIGN and V-Dem. 
All models (3)
The REIGN coups model (reign_coups)
A governance model predominantly informed by the predicted probability of coups from CoupCast (REIGN).  

The global REIGN model
 (reign_global
A global governance model informed by features derived from the monthly Rulers, Election, and Irregular Governance (REIGN) dataset, e.g. information on elections, leader traits, political regime tenures, and coups.  

The political institutions (V-Dem) model 
 (vdem_global)
A political institutions model informed by the Varieties of Democracy (V-Dem) dataset, which describes the political institutions of a country. Key features include physical integrity as a proxy for freedom from political killings and torture by the government, freedom of domestic movement, and indicators for rule of law and access to justice.

Development

Models capturing demographic data from IIASA, as well as development data from the World Development Indicators.  
All models (2)
The demography model (demog)
A development model capturing data on the Shared Socioeconomic Pathways (SSP) that represent socio-economic scenarios consistent with different climate mitigation and adaptation challenges. Data sourced from the IIASA dataset .

The World Development Indicators (WDI) model 
(wdi_global
A development model broadly capturing the level of development by country, including the quality of infrastructure, economic growth, national debt, education, unemployment, gender equality, health care and provision, agricultural dependence, migration flows, and country size. Sourced from the World Bank’s World Development Indicators.

Climate

A drought model informed by the REIGN dataset.  
All models (1)
The REIGN drought model (reign_drought
A climate model informed by the precipitation variable built into the REIGN dataset. 

Multi-feature

A multi-feature model trained on global data. 
All models (1)
The global model (all_global)  
A global model informed by all features that are fed into the country-level models, capturing interactions and non-linearities between the different predictors.

The sub-national model ensemble

Still informed by the country-level data (and vice versa), 11 sub-models were trained specifically to pick up on local variabilities in order to offer more geographically precise predictions of fatal conflict. Broadly speaking, they capture three different themes of conflict drivers, as follows below. 

For additional information, please see the 
model and feature specifications in our source code on GitHub, or the text-based model descriptions with feature importances in the online appendix to our 2021 Special Data Feature in Journal of Peace Research. 

Conflict history

Conflict history models informed by various measures of conflict history, including levels of violence, the time and the space proximity to the last fatal incidence. 
All models (7)
The geographic UCDP conflict history model (hist_legacy)
A model tracing the conflict history of each geographic grid-cell and its adjacent locations, as incidences of conflict are more likely in locations that have experienced conflict in the past. Sourced from UCDP.

The space-time conflict history model
(sptime)
A geographic-level conflict history model that captures the time since, and distance to, episodes of violence. Sourced from the UCDP.

The 1 and 100 BRDs onset models
( onset24_1_all, onset24_100_all)
Models trained to predict onset of conflict with at least one, or at least 100, battle-related deaths (BRD) in a given geographic location. Onset is defined as the first time a specific grid cell, or its neighbours, reaches the given threshold over a 24-month sliding window. The models use the feature set from the all_themes model, coupled with fatality estimates and conflict event counts related to the sptime model, and a subset of data from the Standardized Precipitation Evapotranspiration Index (SPEI), a water balance index computed from both precipitation and temperature data.

The XGBoost model
( all_gxgb)
A Gradient Boosting Machine (GBM) model using the feature set from the all_themes model, coupled with fatality estimates and conflict event counts related to the sptime model, and a subset of data from the Standardized Precipitation Evapotranspiration Index (SPEI), a water balance index computed from both precipitation and temperature data.

The dynamic simulation models
(ds_jpr2020_dummy, ds_jpr2020_greq_25)
Conflict history models making use of dynamic simulations to generate their forecasts. Trained on the incidence of conflict with at least 1 or at least 25 battle-related deaths (BRDs) in a given month from state-based, non-state, and one-sided violence together. Sourced from UCDP.

Human and natural geography

Models capturing terrain, distance to natural resources, human geography, and local development indicators, sourced from PRIO-GRID.
All models (2)
The natural geography model (pgd_natural)
A natural geography model capturing the spatial distance to exploitable resources such as diamonds and petroleum deposits, as well as data on the main type of land in the given area: cultivated areas, barren, forest, mountains, savanna, shrub, pasture, and urban areas. Sourced from PRIO-GRID.

The social geography model
(pgd_social)
A social geography model capturing a set of human geography features that may affect conflict, such as the distance to the capital, the nearest urban center, and the national border. It also captures grid-level population density and development variables such as local GDP, infant mortality rate, and the share of excluded ethnic groups in each location. Sourced from PRIO-GRID.

Multi-feature, cross-level

A multi-feature model trained on global data, and a cross-level model.
All models (2)
The all themes model (allthemes)
A broad model informed by outcome-specific features from all sub-national models, capturing interactions between different features. 

The cross-level model 
( crosslevel)
A cross-level model that allows the country and sub-national levels of analysis to inform one another.

Codebase

ViEWS2020 model specification

Detailed information about the ViEWS2020 model, such as the model estimators used for each model and the complete model feature list, and the ViEWS2 data infrastructure under which it was built and run, can be found in the open-source GitHub repository OpenViEWS2.
  • Deprecated codebase
Please note that later model releases are built and run on a different data infrastructure, with new naming conventions. The descriptions presented here are valid only for the ViEWS2020 model under the ViEWS2 data infrastructure. 

Terminology

Terminology in the model repository

col_outcome

The col_outcome attribute specifies the outcome that given a model serves to predict, i.e. surpass of a given threshold of fatalities from state-based (sb), non-state (ns), or one-sided (os) violence. The outcomes are variations of those recorded in the UCDP-GED dataset. Most common is the dummy encoder of whether 25 or more fatalities will occur from a certain form of violence, to be evaluated against the GED "best estimate" category. The latter outcome is indicated by the greq_25_ged_best prefix.

cols_features and colsets

The cols_features (column features) attribute specifies the sets of data, or – more specifically – the sets of data columns, that inform a given model. In the ViEWS database, all data entries relating to a given variable are collected in the same data column. Such columns contain either raw data on the given input variable – aggregated to ViEWS' levels of analysis – or data that have been processed by means of a specific modeling strategy. Data columns that share a common theme or data source are then grouped together into sets of columns (colsets), a selection of which each model makes use of. It is this selection of colsets that the cols_feature attribute specifies.

How colsets and col_features are named

colsets are named based on their respective themes and/or data sources. Those that are fully derived from the REIGN, V-Dem, or UCDP-GED datasets will contain the prefixes reign_, ged_ or vdem_. Thematically organised colsets with several different data sources will instead be named based on their common denominators, such as economy, gender or regime change.

col_features undergo a more structured specification. They are named by combining abbreviations or acronyms that depict their respective components. As a general rule, column names are constructed as follows: [f]_[parameter_1]_[parameter_2]_[col], where:

	f: transformation function name 
	parameter_1: value (optional) 
	parameter_2: value (optional) 
	col: source column 
Transformation functions

The various transformation functions applied by ViEWS are listed below:

  • delta_col: Time delta of col - tlag_1(col).
  • greq_value_col: Greater or equal dummy encoder.
  • smeq_value_col: Smaller or equal dummy encoder.
  • in_range_low_high_col: Dummy encoder for tlag_time_col
  • tlag_time_col: Time lag.
  • tlead: Time lead.
  • ma_time_col: Moving average over time
  • cweq_value_col: Count while col equals value.
  • time_since_col: Time since column != 0. Implemented as time-lag of 1 of count while col equals 0.
  • decay_halflife_col: Exponential decay function.
  • mean_col: Time-invariant mean of col.
  • ln_col: Natural log of col.
  • demean_col: De-meaned values of col. Is col - mean(col).
  • rollmax_window_col: Rolling max of time window.
  • onset_possible_col: Onset possible if no event occured in the preceeding window times.
  • onset_window_col: Onset is 1 if onset is possible and an event occured. 1 for first event in time window.
  • sum_cols: Sum of columns product: Product of columns.
  • spdist_col: Spatial distance to closest cell or country where col == 1.
  • stdist_k_tscale_col: Space-time distance to closest k cells or countries where col == 1.
  • splag_first_last_col: Spatial lag. Sum of col for all neighboring geographic units from first to last order neighbor. So splag_1_1_ged_dummy_sb is sum of ged_dummy_sb in immediate neighbors. splag_1_2_ged_dummy_sb is the sum of ged_dummy_sb in neighboring geographies and their neighbors. splag_2_2 would give a hollow circle of just neighbors neighbors, but not direct neighbors.

The most common transformation functions are the "greater than or equal to" dummy encoder (greq) and various time lags (tlag). The naming convention is that the transform name and parameters are prepended to the column name. transform_a(transform_b(col, params_b), params_a) is for example named transform_a_params_a_transform_b_params_b_col.

Parameters

Parameters are added where the transformation functions require further specifications, such as numerical thresholds for the "greater than or equal to" dummy encoders. When needed, these values are added immediately after the transformation function acronyms.

Source columns

The source columns (col), in turn, are with a few exceptions copied from the original data source. ged_best_sb is for example is a UCDP variable referring to best estimate (best) of the number of fatalities from state-based (sb) violence in a given time period, as recorded in their GED (ged) dataset.

A variable constructed by the "greater than or equal to" dummy encoder, the source variable ged_best_sb, and 1 fatality as the threshold and property value, would thus become greq_1_ged_best_sb.

Ensemble composition

Lists of the constituent models included in the two ViEWS2020 model ensembles.

Constituent model specifications

Specifications of all ViEWS2020 models. For a list of the active models in each ensemble, please see Ensemble composition above. 

Model features (variables)

The colsets or “column sets” informing one or more of the ViEWS2020 constituent models. For more information, see Terminology above. 

Data sources

Data informing the models

The input data used in ViEWS are transported into tables in our database, where they are organised by theme and/or data source and prefixed accordingly. The individual sources are described below with their corresponding acronyms in parenthesis.

ACLED (acled_)

ACLED is the armed conflict location event data. ViEWS recodes ACLED into approximations of UCDP GED categories of violence. There are thus 8 primary columns exposed by ACLED in ViEWS data:

acled_count_pr: Protest event count
acled_count_sb: State-based violence event count
acled_count_ns: Non-state violence event count
acled_count_os: One sided violence event count
acled_fat_pr: Protest fatality count
acled_fat_sb: State-based violence fatality count
acled_fat_ns: Non-state violence fatality count
acled_fat_os: One sided violence fatality count
acled_dummy_[pr, sb, ns, os] are dummy encodings of acled_count_

FVP (fvp_)

A country-year dataset compiled for a another project. Combining data from VDEM, WDI, EPR. Columns prefixed prop_ are from EPR. Columns prefixed ssp2 are from SSP. Auto, demo, electoral, etc are from V-Dem.

GED (ged_)

The main outcome of ViEWS comes from UCDP-GED.6 main columns are exposed from GED:

ged_best_sb: Best estimate of fatalities for state-based violence.
ged_best_ns: Best estimate of fatalities for non-state violence
ged_best_os: Best estimate of fatalities for one-sided violence
ged_count_sb: Number of events for state-based violence
ged_count_ns: Number of events for non-state violence
ged_count_os: Number of events for one-sided violence

With the transform ged_dummy_[sb, ns, os] dummy encoding ged_count_[sb, ns, os].

ICGCW (icgcw_)

The international crisis group has an online conflict tracker at https://www.crisisgroup.org/crisiswatch.This is scraped and updates are encoded in 5 columns:

icgcw_alerts: Appeared in an alert
icgcw_deteriorated: Situation deteriorated
icgcw_improved: Situation improved
icgcw_opportunities: Opportunity spotted
icgcw_unobserved: Country doesn't appear

PRIO-GRID (pgdata_)

Prio-grid data is fetched from the PRIO-GRID API at https://grid.prio.org/#/apidocs. For full codebook see https://grid.prio.org/#/codebook. 41 columns are exposed from prio-grid with their original names retained. Columns where an yearly (_y) and an static (_s) version are sometimes taken the MAX() of to combine them.

REIGN (reign_)

REIGN Rulers, Elections, and Irregular Governance dataset. For details see https://oefdatascience.github.io/REIGN.github.io/.

SPEI (spei_)

SPEI GLobal Drought monitor. For details see https://spei.csic.es/map/maps.html.

VDEM (vdem_)

Varieties of democracy. Version 10 is currently loaded. For codebook see: https://www.v-dem.net/en/data/data-version-10/.Columns loaded from the Country-Year: V-DemFull+Others file. Columns ending in the following suffixes are currently not included due to memory constraints:

_codehigh
_codelow
_ord
_sd
_mean
_nr
_osp < br/>

WDI (wdi_)

World Bank World Development Indicators. Updated as of May 2020. Downloaded from http://databank.worldbank.org/data/download/WDI_csv.zip For details, see https://databank.worldbank.org/source/world-development-indicators.

Data infrastructure

The ViEWS2 data infrastructure supporting the ViEWS2020 model

The ViEWS2020 legacy model was built and run in the now deprecated ViEWS2 data infrastructure. Both model and infrastructure is documented in the OpenViEWS2 GitHub repository. Please note that the repository remains open for transparency, but is no longer supported or maintained.