Skip to content

fix(CII): replace hard-cap HAPI fallback with log scale + finer displacement tiers#2577

Open
fuleinist wants to merge 4 commits intokoala73:mainfrom
fuleinist:fix/issue-2457-cii-bias
Open

fix(CII): replace hard-cap HAPI fallback with log scale + finer displacement tiers#2577
fuleinist wants to merge 4 commits intokoala73:mainfrom
fuleinist:fix/issue-2457-cii-bias

Conversation

@fuleinist
Copy link
Copy Markdown
Contributor

Summary

Fixes algorithmic bias in the Country Instability Index (CII) where China scores comparably to active conflict states due to Math.min(60, linear) compression in the HAPI fallback conflict score.

Root Cause

In calcConflictScore(), the HAPI fallback used:

hapiFallback = Math.min(60, h.eventsPoliticalViolence * 3 * multiplier);

With CN's multiplier of 2.5, China's 46 HAPI events hit: Math.min(60, 46 * 3 * 2.5) = 60 (capped).
Iran's 1549 events also hit: Math.min(60, 1549 * 3 * 2.0) = 60 (capped).
Result: CN=60, IR=60 — indistinguishable despite 33x raw event difference.

Fix

1. HAPI fallback: log scale instead of linear

hapiFallback = Math.min(60, Math.log1p(events * multiplier) * 12);
  • China (46 events, ×2.5): log1p(115) * 12 ≈ 56
  • Iran (1549 events, ×2.0): log1p(3098) * 12 ≈ 97 → capped at 60
  • Ordering preserved, but gap is now meaningful.

2. Finer displacement tiers (2 → 6 tiers)

Outflow Before After
≥10M +12
≥5M +10
≥1M +8 +8
≥500K +6
≥100K +4 +4
≥10K +2

Syria's 5.65M outflow now scores +10 instead of +8, widening the gap vs China's 332K (+4).

Evidence

Reproduced by issue reporter (@zouyonghe):

  • CN: hapiPoliticalViolence = 46, displacementOutflow = 332,007score = 25
  • IR: hapiPoliticalViolence = 1,549, displacementOutflow = 214,271score = 31
  • SY: hapiPoliticalViolence = 21, displacementOutflow = 5,640,785score = 36

Post-fix, Iran's higher conflict signal should pull it further above China.

Not Addressed (per collaborator feedback)

  • Suggestion 2 (separate geopolitical aliases): Requires editorial decision on scope — out of scope for this PR.
  • Suggestion 4 (document multipliers): Can be addressed in a follow-up documentation PR.

Closes #2457.

Subagent added 3 commits March 31, 2026 05:55
- Create seed-climate-zone-normals.mjs to fetch 1991-2020 historical
  monthly means from Open-Meteo archive API per zone
- Update seed-climate-anomalies.mjs to use WMO normals as baseline
  instead of climatologically meaningless 30-day rolling window
- Add 7 new climate-specific zones: Arctic, Greenland, WestAntarctic,
  TibetanPlateau, CongoBasin, CoralTriangle, NorthAtlantic
- Register climateZoneNormals cache key in cache-keys.ts
- Add fallback to rolling baseline if normals not yet cached

Fixes: koala73#2467
- seed-climate-zone-normals.mjs: Now fetches normals for ALL 22 zones
  (15 original geopolitical + 7 new climate zones) instead of just
  the 7 new climate zones. The 15 original zones were falling through
  to the broken rolling fallback.

- seed-climate-anomalies.mjs: Fixed rolling fallback to fetch 30 days
  of data when WMO normals are not yet cached. Previously fetched only
  7 days, causing baselineTemps slice to be empty and returning null
  for all zones. Now properly falls back to 30-day rolling baseline
  (last 7 days vs. prior 23 days) when normals seeder hasn't run.

- cache-keys.ts: Removed climateZoneNormals from BOOTSTRAP_CACHE_KEYS.
  This is an internal seed-pipeline artifact (used by the anomaly
  seeder to read cached normals) and is not meant for the bootstrap
  endpoint. Only climate:anomalies:v1 (the final computed output)
  should be exposed to clients.

Fixes greptile-apps P1 comments on PR koala73#2504.
…acement tiers

Fixes algorithmic bias where China scores comparably to active conflict
states due to Math.min(60, linear) compression in HAPI fallback.

Changes:
- HAPI fallback: Math.min(60, events * 3 * mult) → Math.min(60, log1p(events * mult) * 12)
  Preserves ordering: Iran (1549 events) now scores >> China (46 events)
- Displacement tiers: 2 → 6 tiers (10K/100K/500K/1M/5M/10M thresholds)
  Adds signal for Syria's 5.65M outflow vs China's 332K

Addresses koala73#2457 (point 1 and 3 per collaborator feedback)
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 31, 2026

Someone is attempting to deploy a commit to the Elie Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions github-actions bot added the trust:safe Brin: contributor trust score safe label Mar 31, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 31, 2026

Greptile Summary

This PR fixes an algorithmic compression bug in the Country Instability Index where China and Iran both capped at the same HAPI fallback score (60) despite a 33× difference in raw events, and separately adds finer displacement tiers to better separate humanitarian crises by magnitude. It also introduces a companion climate-scoring improvement: a new monthly seeder (seed-climate-zone-normals.mjs) pre-computes WMO 1991–2020 climatological normals so seed-climate-anomalies.mjs can compare against a proper 30-year baseline instead of a rolling same-window mean.

Key changes:

  • country-instability.ts: hapiFallback formula changes from events * 3 * multiplier (linear) to log1p(events * multiplier) * 12 (logarithmic), preserving ordering while preventing cap compression. Both calculateCII() and getCountryScore() are updated identically.
  • country-instability.ts: Displacement boost tiers expand from 2 to 6 levels (10K/100K/500K/1M/5M/10M thresholds at +2/+4/+6/+8/+10/+12).
  • seed-climate-zone-normals.mjs (new): Monthly Railway cron that fetches 30 years of Open-Meteo archive data and caches per-zone per-month means under climate:zone-normals:v1 (30-day TTL).
  • seed-climate-anomalies.mjs: Reads the pre-computed normals from Redis; falls back gracefully to the old rolling-30d baseline when normals are absent.

One P1 finding: The new normals seeder's validate function accepts any non-zero zone count. Partial seeding (e.g., 3/22 zones) passes validation, gets cached for 30 days, and causes the anomalies seeder to throw its MIN_ZONES guard on every subsequent run — stale anomaly data could persist for weeks. Mirroring the ceil(22 * 2/3) threshold already used by the anomalies seeder fixes this.

Confidence Score: 4/5

Safe to merge after fixing the normals-seeder validate threshold; the CII scoring fix itself is correct and well-scoped.

The CII scoring changes in country-instability.ts are correct: math checks out, both call-sites updated identically, and the existing test suite covers floor/cap/ordering invariants. The climate-seeder work has one P1: a too-permissive validate function in the new normals seeder can cause a weeks-long degraded state for climate anomaly freshness. No data corruption occurs (old data is preserved), but the issue is a real operational defect on the changed path. Fixing the threshold to >= ceil(ALL_ZONES.length * 2/3) resolves it.

scripts/seed-climate-zone-normals.mjs — validate function and zone-list duplication need attention before deploying the monthly cron.

Important Files Changed

Filename Overview
src/services/country-instability.ts HAPI fallback switches from linear to log1p scale (fixing China≈Iran compression), and displacement tiers expand from 2 to 6 levels; both calculateCII() and getCountryScore() are updated identically — no issues found.
scripts/seed-climate-anomalies.mjs Switches from 30-day rolling baseline to WMO 30-year normals from Redis, adding 7 new climate zones and graceful fallback when normals are absent; currentMonth uses local time instead of UTC (P2).
scripts/seed-climate-zone-normals.mjs New monthly seeder that pre-computes WMO 1991–2020 normals for all 22 zones; validate function accepts any non-zero zone count (P1) which can cause a degraded state in the anomalies seeder for up to 30 days; zone list is duplicated without enforcement (P2).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Monthly cron
seed-climate-zone-normals.mjs] -->|Fetches 1991-2020 archive
for all 22 zones| B[Open-Meteo Archive API]
    B --> C{validate:
zones.length > 0
⚠️ too weak}
    C -->|passes| D[(Redis
climate:zone-normals:v1
TTL 30 days)]
    C -->|fails| E[Abort — no write
old data preserved]
    F[3h cron
seed-climate-anomalies.mjs] --> G{fetchZoneNormalsFromRedis}
    G -->|normals found| H[daysToFetch = 7
hasNormals = true]
    G -->|not found| I[daysToFetch = 30
hasNormals = false]
    H --> J[fetchZone per zone
7 days current data]
    I --> J
    J --> K{zoneNormal found
for this zone?}
    K -->|yes| L[Compare vs WMO monthly mean
baselineSource: wmo-30y-normals]
    K -->|no + hasNormals=true| M[Fallback: slice 0..-7
⚠️ empty — returns null]
    K -->|no + hasNormals=false| N[Fallback: 30-day rolling
baselineSource: rolling-30d-fallback]
    L --> O{MIN_ZONES check
ceil 22 × 2/3 = 15}
    M --> O
    N --> O
    O -->|enough zones| P[(Redis
climate:anomalies:v1
TTL 3h)]
    O -->|too few| Q[Throw — skip write
preserve stale data]
Loading

Reviews (1): Last reviewed commit: "fix(CII): replace hard-cap HAPI fallback..." | Re-trigger Greptile

Comment on lines +180 to +182
ttlSeconds: CACHE_TTL,
sourceVersion: 'open-meteo-archive-wmo-normals',
}).catch((err) => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Validate threshold too weak — partial failure silently breaks anomalies seeder

The current validate function accepts any non-zero zone count. If the normals seeder succeeds for only a few zones (e.g., 3 out of 22 due to transient API failures), the weak predicate passes, and those incomplete normals are written to Redis with a 30-day TTL.

The anomalies seeder then reads this cache, sees normals.length > 0, and sets daysToFetch = 7. But the 19 zones without normals enter the fallback path where temps.slice(0, -7) is empty (only 7 days were fetched), triggering the baselineTemps.length < 7 guard and returning null. The anomalies seeder's MIN_ZONES = ceil(22 * 2/3) = 15 check then fails on every run until the normals seeder re-runs (up to 30 days later), so users receive stale climate anomaly data for the rest of the cache window.

The anomalies seeder already applies a 2/3 zone threshold — mirror that here:

Suggested change
ttlSeconds: CACHE_TTL,
sourceVersion: 'open-meteo-archive-wmo-normals',
}).catch((err) => {
const MIN_ZONES = Math.ceil(ALL_ZONES.length * 2 / 3);
function validate(data) {
return Array.isArray(data?.zones) && data.zones.length >= MIN_ZONES;
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your comment has been addressed: The validate function now requires at least MIN_ZONES = ceil(22*2/3) = 15 zones (imported from _climate-zones.mjs). Previously it accepted any non-zero zone count, which could write an incomplete cache and cause the anomalies seeder to throw on every run for up to 30 days.

Comment on lines +20 to +46
// Geopolitical zones (original 15 — must be kept in sync with seed-climate-anomalies.mjs)
const ZONES = [
{ name: 'Ukraine', lat: 48.4, lon: 31.2 },
{ name: 'Middle East', lat: 33.0, lon: 44.0 },
{ name: 'Sahel', lat: 14.0, lon: 0.0 },
{ name: 'Horn of Africa', lat: 8.0, lon: 42.0 },
{ name: 'South Asia', lat: 25.0, lon: 78.0 },
{ name: 'California', lat: 36.8, lon: -119.4 },
{ name: 'Amazon', lat: -3.4, lon: -60.0 },
{ name: 'Australia', lat: -25.0, lon: 134.0 },
{ name: 'Mediterranean', lat: 38.0, lon: 20.0 },
{ name: 'Taiwan Strait', lat: 24.0, lon: 120.0 },
{ name: 'Myanmar', lat: 19.8, lon: 96.7 },
{ name: 'Central Africa', lat: 4.0, lon: 22.0 },
{ name: 'Southern Africa', lat: -25.0, lon: 28.0 },
{ name: 'Central Asia', lat: 42.0, lon: 65.0 },
{ name: 'Caribbean', lat: 19.0, lon: -72.0 },
];

// Climate-specific zones (7 new zones)
const CLIMATE_ZONES = [
{ name: 'Arctic', lat: 70.0, lon: 0.0 }, // sea ice proxy
{ name: 'Greenland', lat: 72.0, lon: -42.0 }, // ice sheet melt
{ name: 'WestAntarctic', lat: -78.0, lon: -100.0 }, // Antarctic Ice Sheet
{ name: 'TibetanPlateau', lat: 31.0, lon: 91.0 }, // third pole
{ name: 'CongoBasin', lat: -1.0, lon: 24.0 }, // largest tropical forest after Amazon
{ name: 'CoralTriangle', lat: -5.0, lon: 128.0 }, // reef bleaching proxy
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Zone list duplicated — no enforcement of sync with seed-climate-anomalies.mjs

ZONES, CLIMATE_ZONES, and ALL_ZONES are defined identically in both seeder files, with only a comment instructing that they "must be kept in sync." If a zone is added or renamed in one file but not the other, the normals lookup in fetchZone() silently returns null for that zone (since normals?.find((n) => n.zone === zone.name) finds no match). The zone then falls back to the short 7-day rolling window when WMO normals are otherwise available, producing a climatologically incorrect anomaly.

Consider extracting the zone definitions into a shared file (e.g., scripts/_climate-zones.mjs) and importing it in both seeders, so a single edit keeps everything consistent.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your comment has been addressed: Zone definitions (ZONES, CLIMATE_ZONES, ALL_ZONES, MIN_ZONES) have been extracted into scripts/_climate-zones.mjs as a single source of truth. Both seeders now import from it, so any zone add/rename/remove is always consistent across both files.

Comment on lines +155 to +157

const tempDelta = Math.round((currentTempMean - baselineTempMean) * 10) / 10;
const precipDelta = Math.round((currentPrecipMean - baselinePrecipMean) * 10) / 10;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 currentMonth uses local system time — can be off by one at month boundaries

new Date().getMonth() + 1 resolves the month in the Railway container's local timezone. If the container timezone differs from UTC, the month lookup could be off by one during the first few hours of each month, causing monthNormal to not be found and silently returning null for those zones.

Consider using UTC explicitly:

Suggested change
const tempDelta = Math.round((currentTempMean - baselineTempMean) * 10) / 10;
const precipDelta = Math.round((currentPrecipMean - baselinePrecipMean) * 10) / 10;
const currentMonth = new Date().getUTCMonth() + 1; // 1-12, UTC

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your comment has been addressed: currentMonth now uses new Date().getUTCMonth() + 1 instead of getMonth() + 1, eliminating the risk of off-by-one errors at month boundaries when the Railway container's local timezone differs from UTC.

- P1: seed-climate-zone-normals validate now requires >= ceil(22*2/3)=15
  zones instead of >0. Partial seeding (e.g. 3/22) was passing validation
  and writing a 30-day TTL cache that would cause the anomalies seeder to
  throw on every run until cache expiry.

- P2: Extract shared zone definitions (ZONES, CLIMATE_ZONES, ALL_ZONES,
  MIN_ZONES) into scripts/_climate-zones.mjs. Both seeders now import from
  the same source, eliminating the risk of silent divergence.

- P2: seed-climate-anomalies currentMonth now uses getUTCMonth() instead
  of getMonth() to avoid off-by-one at month boundaries when the Railway
  container's local timezone differs from UTC.

Reviewed-by: greptile-apps
fuleinist pushed a commit to fuleinist/worldmonitor that referenced this pull request Apr 2, 2026
- P1: seed-climate-zone-normals validate now requires >= ceil(22*2/3)=15
  zones instead of >0. Partial seeding (e.g. 3/22) was passing validation
  and writing a 30-day TTL cache that would cause the anomalies seeder to
  throw on every run until cache expiry.

- P2: Extract shared zone definitions (ZONES, CLIMATE_ZONES, ALL_ZONES,
  MIN_ZONES) into scripts/_climate-zones.mjs. Both seeders now import from
  the same source, eliminating the risk of silent divergence.

- P2: seed-climate-anomalies currentMonth now uses getUTCMonth() instead
  of getMonth() to avoid off-by-one at month boundaries when the Railway
  container's local timezone differs from UTC.

Reviewed-by: greptile-apps
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

trust:safe Brin: contributor trust score safe

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CII manual tuning can over-amplify China relative to active conflict states, creating a geopolitical bias problem

1 participant