Overview
localintel (Local Intelligence) is an R package and inequality mapping engine that provides a unified pipeline for subnational indicator analysis across two major data ecosystems:
- Eurostat — 150+ indicators across 14 thematic domains for 235 European NUTS-2 regions
- DHS (Demographic and Health Surveys) — 50+ indicators across 8 domains for 500+ Admin-1 regions in Sub-Saharan Africa
Any subnational dataset can be fetched, harmonized, gap-filled, cascaded, mapped, and exported through a single consistent workflow.
|
|
200+ Indicators — Curated registries for Eurostat (14 domains) and DHS (8 domains) with batch download and retry logic |
|
|
Universal Processing — Generic processors plus domain-specific functions for instant harmonization across both data sources |
|
|
Data Cascading — Intelligent propagation from country to regional level with source-level tracking (NUTS for Europe, Admin-1 for SSA) |
|
|
Gap-Filling — GAM-based temporal interpolation and forecasting for DHS time series, with provenance flags on every cell |
|
|
Inequality Mapping — Publication-ready maps with automatic best-level selection and DHS Admin-1 choropleth support |
|
|
Export — Tableau-ready GeoJSON, Excel, PDF maps, and RDS with enrichment and performance tags |
Live Demo
See the inequality mapping engine in action — Where Inequality Lives and Where Data Matters are interactive dashboards built entirely with data processed through this package. They maps regional disparities across 235 European NUTS-2 regions and 652 Sub-Saharan Africa’s Admin-1 Regions from over multiple decades, with live indicator switching, animated timeline playback, and API-driven regional insights powered by an indicator-aware narrative engine.
The pipeline is fully parametrizable and can be adapted to any indicator domain or geography.
Installation
# Install from GitHub
# install.packages("devtools")
devtools::install_github("MohamedHtitich1/localintel")Quick Start — Eurostat
Fetch one indicator, cascade it to every NUTS-2 region, and plot — in four lines:
library(localintel)
gdp_raw <- get_nuts_level_robust("nama_10r_2gdp", level = 2, years = 2020:2024)
gdp <- process_gdp(gdp_raw)
cascaded <- cascade_to_nuts2(gdp, vars = "gdp", years = 2020:2024)
plot_best_by_country_level(cascaded, get_nuts_geopolys(), var = "gdp", years = 2024:2024)Quick Start — DHS (Sub-Saharan Africa)
Fetch DHS indicators, gap-fill the time series, and assemble an Admin-1 panel:
library(localintel)
# 1. Fetch & process DHS indicators for SSA
raw <- fetch_dhs_batch(dhs_mortality_codes(), country_ids = ssa_codes())
proc <- process_dhs_batch(raw)
# 2. Gap-fill with GAM-based interpolation
gapfilled <- gapfill_all_dhs(proc)
# 3. Cascade to Admin 1 panel with national fallback
panel <- cascade_to_admin1(gapfilled)
# 4. Balance the panel (drop thin indicators/regions)
balanced <- balance_dhs_panel(panel, min_countries = 5)
# 5. Map a single indicator
plot_dhs_map(balanced$panel, var = "u5_mortality", year = 2020)Or run the entire pipeline in one call:
result <- dhs_pipeline(
country_ids = ssa_codes(),
indicator_codes = c(dhs_mortality_codes(), dhs_nutrition_codes()),
forecast_to = 2025
)Full Multi-Domain Workflow (Eurostat)
Once you’re comfortable, scale up to multiple domains at once:
library(localintel)
# 1. Fetch data from multiple domains
econ <- fetch_eurostat_batch(economy_codes(), level = 2, years = 2015:2024)
hlth <- fetch_eurostat_batch(health_system_codes(), level = 2, years = 2015:2024)
lab <- fetch_eurostat_batch(labour_codes(), level = 2, years = 2015:2024)
# 2. Process with domain-specific or generic processors
gdp <- process_gdp(econ$gdp_nuts2)
beds <- process_beds(hlth$beds)
unemp <- process_unemployment_rate(lab$unemployment_rate)
# 3. Merge, cascade to NUTS2, and impute temporal gaps
all_data <- merge_datasets(gdp, beds, unemp)
cascaded <- cascade_to_nuts2(
all_data,
vars = c("gdp", "beds", "unemployment_rate"),
years = 2015:2024,
impute = TRUE, # adaptive econometric imputation (PCHIP + ETS)
forecast_to = 2025 # extend series with AIC-selected forecasts
)
# Check traceability
table(cascaded$src_gdp_level) # 2=NUTS2, 1=NUTS1, 0=NUTS0
table(cascaded$imp_gdp_flag) # 0=observed, 1=interpolated, 2=forecasted
# 4. Visualize
geopolys <- get_nuts_geopolys()
plot_best_by_country_level(cascaded, geopolys, var = "gdp", years = 2022:2024)
# 5. Export for Tableau
sf_all <- build_multi_var_sf(
cascaded, geopolys,
vars = c("gdp", "beds", "unemployment_rate"),
years = 2015:2024,
var_labels = regional_var_labels(),
pillar_mapping = regional_domain_mapping()
)
export_to_geojson(sf_all, "output/multi_domain_nuts2.geojson")Why localintel?
If you work with subnational data from Eurostat or DHS, you’ve likely used packages like eurostat or rdhs. They’re excellent for downloading individual datasets — but getting from raw downloads to a complete, analysis-ready panel is where most of the work begins. localintel picks up where they leave off:
| Raw data packages | localintel | |
|---|---|---|
| Scope | One dataset at a time | 200+ indicators across Eurostat and DHS in a single batch call |
| Processing | Raw data as-is | Domain-specific processors select units, filter dimensions, and standardize columns automatically |
| Gaps | Missing regions stay missing | Cascade fills every region from parent levels (100% geographic coverage) with source tracking |
| Time series | Gaps remain | GAM interpolation (DHS) and PCHIP + ETS forecasting (Eurostat), with provenance flags on every cell |
| Name harmonization | Manual | Automatic DHS-to-GADM name matching with 40+ country crosswalks, composite region dissolution, and fuzzy matching |
| Output | Data frame | Maps, GeoJSON, Excel, PDF map books — all from the same pipeline |
Domain Coverage
Eurostat (Europe — NUTS-2 Regions)
| Domain | Indicators | Key Datasets |
|---|---|---|
| Economy | 10 | GDP, GVA, gross fixed capital formation, household income |
| Demography | 13 | Population, life expectancy, fertility, mortality |
| Education | 11 | Attainment, students, training, early leavers, NEET |
| Labour Market | 13 | Employment, unemployment, activity rates, long-term unemployment |
| Health System | 6 | Hospital beds, physicians, discharges, length of stay |
| Causes of Death | 16 | Standardised death rates, PYLL, infant mortality |
| Tourism | 8 | Arrivals, nights spent, accommodation capacity |
| Transport | 10 | Road, rail, air, maritime infrastructure and traffic |
| Environment | 7 | Municipal waste, energy, contaminated sites |
| Science & Technology | 11 | R&D expenditure, patents, HRST, high-tech employment |
| Poverty & Exclusion | 4 | At-risk-of-poverty, material deprivation, low work intensity |
| Agriculture | 5 | Crops, livestock, land use, milk production |
| Business | 6 | SBS, business demography, local units |
| Information Society | 6 | Internet access, broadband, e-commerce, e-government |
| Crime | 1 | Crimes recorded by police |
DHS (Sub-Saharan Africa — Admin-1 Regions)
| Domain | Indicators | Key Measures |
|---|---|---|
| Mortality | 8 | Under-5 mortality, infant mortality, neonatal mortality, child mortality |
| Nutrition | 6 | Stunting, wasting, underweight, overweight, anemia, exclusive breastfeeding |
| Health | 8 | Vaccination coverage, ANC visits, skilled birth attendance, modern contraception |
| WASH | 6 | Improved water, improved sanitation, handwashing, open defecation |
| Education | 5 | Literacy, school attendance, educational attainment |
| HIV | 5 | HIV prevalence, knowledge, testing, condom use |
| Gender | 4 | Women’s decision-making, attitudes toward violence, early marriage |
| Wealth | 4 | Wealth index, asset ownership, poverty headcount |
Use indicator_count() for Eurostat totals and dhs_indicator_count() for DHS totals.
Key Features
Data Cascading
The package automatically fills missing regional data by cascading from parent levels:
Eurostat: NUTS 0 (Country) → NUTS 1 (Major Regions) → NUTS 2 (Regions)
DHS: National → Admin 1 (with gap-filling and name harmonization)
Every cascaded value is tracked via src_<variable>_level columns, enabling full transparency and sensitivity analysis.
DHS Name Harmonization
Matching DHS region names to GADM administrative boundaries is notoriously difficult. localintel handles this automatically through four lookup tables covering 40+ SSA countries: manual crosswalks, composite region splitting, sub-national dissolves, and fuzzy string matching — with 100% coverage for all Tier 1 DHS countries.
GAM-Based Gap-Filling (DHS)
DHS surveys are conducted irregularly (every 3–7 years). The gap-filling engine uses penalized GAM splines to interpolate between surveys and optionally forecast beyond the last observation, producing smooth continuous time series with uncertainty bounds and provenance flags on every value.
Generic Processing
The process_eurostat() function handles any Eurostat dataset with flexible dimension filtering:
# Custom indicator with arbitrary filters
custom <- process_eurostat(raw_data,
filters = list(unit = "PC", sex = "T", age = "Y25-64"),
out_col = "my_indicator"
)Visualization
Automatic “best level” selection for maps — displays the finest available resolution per country:
# Eurostat
plot_best_by_country_level(cascaded, geopolys, var = "unemployment_rate",
years = 2022:2024, title = "Unemployment Rate (%)")
# DHS
plot_dhs_map(panel, var = "u5_mortality", year = 2020,
title = "Under-5 Mortality Rate")Tableau Integration
Full support for Tableau exports with country names, region labels, population-weighted aggregations, and performance tags:
# Eurostat
sf_enriched <- enrich_for_tableau(sf_all, pop_data, nuts2_names)
export_to_geojson(sf_enriched, "eurostat_dashboard.geojson")
# DHS
dhs_sf <- enrich_dhs_for_tableau(panel, geo_sf)
export_to_geojson(dhs_sf, "dhs_dashboard.geojson")Related Work
This package was developed as part of research on subnational regional analysis. Related projects:
- Where Inequality Lives — Interactive plateform for EU27
- Where Data Matters — Interactive plateform for Sub Saharan Africa