Skip to content

[Cosmos][WIP]: Normalize region names passed as preferred or exclude regions.#49090

Open
jeet1995 wants to merge 38 commits into
Azure:mainfrom
jeet1995:squad/region-name-mapper
Open

[Cosmos][WIP]: Normalize region names passed as preferred or exclude regions.#49090
jeet1995 wants to merge 38 commits into
Azure:mainfrom
jeet1995:squad/region-name-mapper

Conversation

@jeet1995
Copy link
Copy Markdown
Member

@jeet1995 jeet1995 commented May 7, 2026

Problem

Customers passing region names in non-canonical forms (e.g., west us 3 or westus3 instead of West US 3) hit routing issues. The Java SDK stores region names in different representations (lowercased in maps, original case in lists), and some code paths use case-sensitive String.equals()/List.contains() — causing mismatches between user-provided and server-returned region names.

Definitions

  • Canonical = official CosmosDB display form with proper casing and spaces: "West US 3", "East US", "North Europe". Sourced from Settings.xml regionToIdMapping.
  • Normalized = lowercase, no spaces: "westus3", "eastus", "northeurope". Used as internal map keys and URL suffixes.
  • Server-returned = getName().toLowerCase(Locale.ROOT): "west us 3", "east us" (lowercase, spaces preserved). Used as keys in CaseInsensitiveMap.

How the SDK solves it

RegionUtils is the single source of truth for region name mappings:

getCanonicalRegionName("westus3")     → "West US 3"       (canonical)
getCanonicalRegionName("WEST US 3")   → "West US 3"       (canonical)
getCanonicalRegionName("Future Foo")  → "Future Foo"       (unknown → as-is)

getNormalizedRegionName("West US 3")  → "westus3"          (normalized)
getNormalizedRegionName("Future Foo") → "futurefoo"         (normalized)

Escape hatch for unknown regions: If a region is not in the static map, getCanonicalRegionName returns the input as-is. This works because LocationCache applies toLowerCase() and uses CaseInsensitiveMap — so unknown regions match correctly as long as the customer's input has the same words as the server response.

Public API preserves customer input: ConnectionPolicy.getPreferredRegions() and CosmosDiagnostics reflect the customer-supplied values as-is. All canonicalization is scoped to LocationCache internals.

Where canonicalization happens

All canonicalization is scoped to LocationCacheConnectionPolicy stores raw customer input only.

Preferred regions (once at LocationCache construction):

Customer: "westus3" → getCanonicalRegionName() → "West US 3" → toLowerCase() → "west us 3"
Server:   "West US 3" → toLowerCase() → "west us 3"
Match:    CaseInsensitiveMap.get("west us 3") → ✓

Exclude regions (per request, cached):

Customer: "westus3" → canonicalRegionNameCache.computeIfAbsent(getCanonicalRegionName) → "West US 3"
Server:   regionName = "west us 3"
Match:    "West US 3".equalsIgnoreCase("west us 3") → ✓

URL suffix (in LocationHelper, for fallback endpoint discovery):

Customer: "West US 3" → getNormalizedRegionName() → "westus3"
URL:      contoso-westus3.documents.azure.com

Bug fix: PPCB List.contains() case-sensitivity

In LocationCache.reevaluate() (line 506), the old code used List.contains() to check if a PPCB-excluded region was also user-excluded:

// Before (bug on main): case-sensitive — "West US 3".equals("west us 3") → false
!userConfiguredExcludeRegions.contains(internalExcludeRegion)

// After: canonicalizes both sides, case-insensitive
!RegionUtils.containsRegionIgnoreCase(userConfiguredExcludeRegions, internalExcludeRegion)

This caused user-excluded regions to be silently re-added as retry targets when PPCB was active.

Changes

RegionUtils.java (renamed from RegionNameToRegionIdMap.java)

  • final class with private constructor. Single static block derives all maps from CANONICAL_REGION_NAME_TO_REGION_ID_MAPPINGS with fail-fast on duplicate IDs/names.
  • getCanonicalRegionName(String) — returns canonical form; unknown regions as-is.
  • getNormalizedRegionName(String) — returns lowercase, no-spaces form for URL construction.
  • getRegionId(String) — fast-path (raw key) + slow-path (normalize on miss).
  • canonicalizeRegionNames(List), containsRegionIgnoreCase(List, String) — batch canonicalization and case-insensitive membership check.

LocationCache.java

  • Constructor: canonicalizes preferred regions via getCanonicalRegionName().toLowerCase().
  • Exclude regions: inline canonicalization via canonicalRegionNameCache.computeIfAbsent() during stream comparison — zero list allocation, cached per unique string.
  • Bug fix: containsRegionIgnoreCase() in reevaluate().

LocationHelper.java

  • dataCenterToUriPostfix() uses RegionUtils.getNormalizedRegionName() for URL suffix.

ConnectionPolicy.java

  • setPreferredRegions() stores raw customer input via defensive copy. No canonicalization, no extra fields.

PartitionScopedRegionLevelProgress.java

  • getRegionId() now normalizes internally — callers no longer pre-normalize.

RxDocumentClientImpl.java

  • Removed manual toLowerCase().trim().replace(" ", "") before getRegionId().

Impact analysis

Component Impact
PPCB Fixedreevaluate line 506
PPAF None — URI-based, no region name strings
RegionScopedSessionContainer None — own normalization, server names only
Availability Strategy None — routing uses LocationCache
Exclude Regions Fixed — canonicalized via cache
LocationCache Core canonicalization point
RegionalRoutingContext None — URI-based

Tests

Unit tests

  • 12 RegionUtilsNormalizationTest — case variants, space removal, passthrough, null/empty, unknown regions, canonicalizeRegionNames, containsRegionIgnoreCase
  • 20 LocationCacheTest — 9 existing + 6 canonicalization tests + 5 unknown region tests (Pluto Central / Mars South)
  • 1 ApplicableRegionEvaluatorTest — PPCB reevaluate regression: non-canonical user exclude vs server-form PPCB exclude
  • 1 RegionUtilsTests — ID map consistency

E2E tests

  • ExcludeRegionTests — 4 new: non-canonical preferred + exclude + fault injection (7 op types each)
  • FaultInjectionWithAvailabilityStrategyTestsBase — 1 new: non-canonical + eager strategy + 503 → hedging
  • PerPartitionCircuitBreakerE2ETests — 1 new: non-canonical + PPCB + 503 → circuit breaker failover

Customers passing region names in non-canonical forms (e.g., 'west us 3'
instead of 'West US 3') hit routing issues because the Java SDK stores
region names in different forms and some comparisons use case-sensitive
String.equals()/List.contains().

Changes:
- Add RegionNameMapper: strips spaces + case-insensitive lookup against
  90+ known Azure regions to produce canonical names (e.g., 'westus3' or
  'west us 3' -> 'West US 3'). Unknown regions pass through as-is.
- ConnectionPolicy.setPreferredRegions(): normalize + order-preserving
  dedupe at entry point.
- LocationCache constructor: apply RegionNameMapper before toLowerCase
  for defense-in-depth.
- Fix case-sensitive List.contains() bug in reevaluate() (line 502):
  use containsRegionIgnoreCase() instead.
- Normalize user-configured exclude regions at point of use in
  getApplicableRegionRoutingContexts() to prevent mismatches with
  PPCB-derived lowercased region names.
- Add RegionNameMapperTest with 43 unit tests covering case variants,
  space removal, passthrough, null/empty handling.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions Bot added the Cosmos label May 7, 2026
jeet1995 and others added 7 commits May 7, 2026 10:12
The static region list in RegionNameMapper goes stale when new Azure
regions are added. Fix: add a ConcurrentHashMap-backed dynamic tier
that learns canonical region names from server responses.

- RegionNameMapper.registerRegionName(): registers canonical names from
  DatabaseAccountLocation (called from LocationCache.addRoutingContexts).
  After the first account read, even new regions like 'West US 4' can
  normalize 'westus4' → 'West US 4'.
- getCosmosDBRegionName(): checks static map first, then dynamic map.
- Add 2 new tests for dynamic registration behavior.
- 45/45 RegionNameMapperTest pass, 32/32 LocationCacheTest pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous commit had stash conflict markers (<<<<<<< Updated upstream /
>>>>>>> Stashed changes) left in RegionNameMapper.java and
RegionNameMapperTest.java. Rewrote both files clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Merge the separate RegionNameMapper into RegionNameToRegionIdMap as the
single source of truth for region names. This eliminates maintaining two
parallel region lists that can drift out of sync.

Changes:
- Delete RegionNameMapper.java — normalization logic moved into
  RegionNameToRegionIdMap.
- RegionNameToRegionIdMap now provides region ID mapping (existing) AND
  region name normalization (new) from one canonical list.
- Sync REGION_NAME_TO_REGION_ID_MAPPINGS with backend RegionToIdMap.cs:
  add Bleu France Central/South (107/108), Delos Cloud Germany
  Central/North (109/110), Singapore Central/North (111/112), fix
  'easteurope' → 'East Europe' (54).
- Build NORMALIZED_REGION_NAME_TO_REGION_ID_MAPPINGS programmatically
  from REGION_NAME_TO_REGION_ID_MAPPINGS instead of manual duplication.
- Normalization static map seeded from ID map keys + additional regions
  without IDs yet (from .NET SDK Regions.cs).
- Rename test: RegionNameMapperTest → RegionNameToRegionIdMapNormalizationTest.
- Update ConnectionPolicy and LocationCache references.
- All 78 tests pass (45 normalization + 32 LocationCache + 1 consistency).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add 7 tests to LocationCacheTest using real Azure region names to verify
that preferred regions and exclude regions work correctly with
non-canonical input:

- preferredRegions_lowercaseShouldMatchCanonical: 'west us 3' → West US 3
- preferredRegions_noSpacesShouldMatchCanonical: 'westus3' → West US 3
- preferredRegions_uppercaseShouldMatchCanonical: 'WEST US 3' → West US 3
- preferredRegions_duplicateAfterNormalizationShouldDedupe: 'westus3' +
  'West US 3' deduped to single entry
- excludeRegions_lowercaseNoSpacesShouldExclude: 'westus3' excludes
  West US 3
- excludeRegions_mixedCasingShouldExclude: 'EAST us' excludes East US
- excludeRegions_requestLevelNoSpacesShouldExclude: request-level
  'eastus' excludes East US

All 39 LocationCacheTest unit tests pass (32 existing + 7 new).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Create a second CosmosClient with space-stripped preferred regions
(e.g., 'westus3' instead of 'west us 3') and verify that routing and
region exclusion work identically to canonical names.

New tests:
- nonCanonicalPreferredRegions_shouldRouteCorrectly: client with
  space-stripped preferred regions routes to correct first region
  (7 operation types via DataProvider)
- nonCanonicalExcludeRegion_shouldSkipExcludedRegion: excluding with
  space-stripped name (e.g., 'westus3') correctly skips that region
  (7 operation types via DataProvider)
- uppercaseExcludeRegion_shouldSkipExcludedRegion: excluding with
  UPPERCASE name correctly skips that region

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add tests that create CosmosClients with space-stripped preferred regions
(e.g., 'westus3' instead of 'West US 3') and verify correct routing.

FaultInjectionWithAvailabilityStrategyTestsBase:
- Add nonCanonicalWriteableRegions field (space-stripped from server names)
- readAfterCreation_nonCanonicalPreferredRegions_shouldRouteCorrectly:
  creates client with space-stripped regions, reads with eager availability
  strategy, verifies first contacted region matches expected canonical name

PerPartitionCircuitBreakerE2ETests:
- nonCanonicalPreferredRegions_ppcbShouldStillRouteCorrectly: creates
  client with space-stripped regions, performs create+read, verifies
  diagnostics show routing to correct first preferred region

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Simplify RegionNameToRegionIdMap by removing the ConcurrentHashMap-backed
dynamic registration tier. Unknown regions are returned as-is, which is
sufficient because LocationCache's CaseInsensitiveMap + toLowerCase
handles the matching for any region the server returns.

- Remove DYNAMIC_NORMALIZED_TO_CANONICAL and registerRegionName()
- Remove registerRegionName() call from LocationCache.addRoutingContexts()
- Replace dynamic registration tests with passthrough assertion tests
- 84/84 tests pass (44 normalization + 39 LocationCache + 1 consistency)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jeet1995 jeet1995 changed the title Port RegionNameMapper from .NET SDK to normalize region names [Cosmos][WIP]: Normalize region names passed as preferred or exclude regions. May 7, 2026
jeet1995 and others added 5 commits May 7, 2026 12:21
…sible

Duplicate preferred regions after normalization (e.g., ['westus3', 'West US 3']
both becoming 'West US 3') are an obvious customer misconfiguration. The SDK
should not silently mask this — let the duplicates pass through so the customer
can see and fix their config.

Also clarify code comments for the escape hatch behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add 10 regions from the authoritative LocationNames.cs that were missing
from the normalization map: East US SLV, Southeast US, Southwest US,
South Central US 2, Southeast US 3, Southeast US 5, Northeast US 5,
India South Central, Southeast Asia 3, West Central US FRE.

Region ID mappings remain a subset (only regions with assigned IDs from
RegionToIdMap.cs). The normalization map is the superset sourced from
LocationNames.cs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the previous RegionToIdMap.cs-based ID map with the complete
regionToIdMapping from Settings.xml (IDs 1-124). This is the
authoritative source for region name ↔ ID mappings used for session
token region-level progress tracking.

- Add 44 new region IDs (74-124): Brazil Southeast, West US 3, Qatar
  Central, Italy North, East US 3, Saudi Arabia East, etc.
- Remove separate 'additional canonical names' block — all canonical
  names now derive from the ID map since Settings.xml is the superset.
- Remove 'Greece Central' which was not in any authoritative source.
- Update javadoc and code comments to reference Settings.xml as the
  authoritative source instead of RegionToIdMap.cs.
- 83/83 tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ationCache

- Rename class to RegionUtils — better reflects its dual role (ID mapping
  + region name normalization).
- Move normalizeRegionNames() and containsRegionIgnoreCase() from
  LocationCache private helpers into RegionUtils as public static methods.
- Rename all test files to match: RegionUtilsNormalizationTest,
  RegionUtilsTests.
- Update all references across ConnectionPolicy, LocationCache,
  PartitionScopedRegionLevelProgress, RxDocumentClientImpl.
- 83/83 tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jeet1995
Copy link
Copy Markdown
Member Author

jeet1995 commented May 7, 2026

/azp run java - cosmos - tests

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995 jeet1995 marked this pull request as ready for review May 7, 2026 17:37
@jeet1995 jeet1995 requested a review from kirankumarkolli as a code owner May 7, 2026 17:37
Copilot AI review requested due to automatic review settings May 7, 2026 17:37
@jeet1995 jeet1995 requested review from a team as code owners May 7, 2026 17:37
@jeet1995
Copy link
Copy Markdown
Member Author

jeet1995 commented May 7, 2026

@sdkReviewAgent

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses Cosmos DB routing mismatches caused by non-canonical Azure region name inputs by introducing centralized region normalization and applying it to preferred/excluded region handling across the Cosmos Java SDK routing stack.

Changes:

  • Added RegionUtils as the single source of truth for region ID mappings and canonical region name normalization, and updated call sites to use it.
  • Normalized preferred/excluded regions in ConnectionPolicy and LocationCache, including a fix for a case-sensitive exclude-region check in PPCB reevaluation logic.
  • Added/updated unit and E2E tests to validate routing behavior with non-canonical region inputs and updated the Cosmos changelog entry.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/RxDocumentClientImpl.java Switches region ID lookup to RegionUtils.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/routing/RegionUtils.java Introduces region normalization + region ID mapping utilities.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/routing/RegionNameToRegionIdMap.java Removes the old region mapping class in favor of RegionUtils.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/routing/LocationCache.java Normalizes excluded regions and fixes PPCB exclude-region comparison behavior.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/PartitionScopedRegionLevelProgress.java Updates region ID/name lookups to RegionUtils.
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/ConnectionPolicy.java Normalizes preferred regions at configuration time.
sdk/cosmos/azure-cosmos/CHANGELOG.md Documents the normalization + PPCB exclude-region fix.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/PerPartitionCircuitBreakerE2ETests.java Adds E2E coverage for PPCB routing with non-canonical preferred regions.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/routing/RegionUtilsNormalizationTest.java Adds unit coverage for normalization behavior.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/routing/LocationCacheTest.java Adds integration-style unit tests for preferred/exclude region normalization with real region names.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/RegionUtilsTests.java Updates the existing mapping-consistency test to the new RegionUtils.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/FaultInjectionWithAvailabilityStrategyTestsBase.java Adds E2E validation for availability strategy routing with non-canonical preferred regions.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/ExcludeRegionTests.java Adds E2E coverage for non-canonical preferred/exclude region inputs.

Comment on lines +5331 to +5332
assertThat(diagnosticsContext.getContactedRegionNames().iterator().next())
.isEqualTo(expectedFirstRegion);
@xinlian12
Copy link
Copy Markdown
Member

@sdkReviewAgent

return canonical;
}

return regionName;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether we should fallback to normalized version when not found from the map. Even in globalEndpointManager, we just use the normalized version for findings the regional endpoint

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we should keep behavior for unknown regions as is? Backward compat mindset - GlobalEndpointManager today just relies on case insensitivity (no space trimming).

* returned as-is.</li>
* </ol>
*/
public class RegionUtils {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also change to use RegionUtils for GlobalEndpointManager as well


String normalized = regionName.toLowerCase(Locale.ROOT).replace(" ", "");

String canonical = NORMALIZED_TO_CANONICAL.get(normalized);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Suggestion — Forward Compatibility: Unknown regions with space variants won't match

For unknown regions (not in the static map), getCosmosDBRegionName returns the input as-is. This means two variant spellings of the same unknown region won't match if they differ by spaces:

  • Customer passes preferredRegions = ["futureregion"]
  • Server returns "Future Region" → stored as "future region" in addRoutingContexts
  • Preferred location "futureregion""future region"region not matched

The PR description acknowledges this: "spaces are optional for known regions only." The defense-in-depth via CaseInsensitiveMap + toLowerCase() handles case differences for unknown regions, but not space differences.

If forward compatibility for space-stripped unknown regions is desired, the fallback could return the normalized form instead of the original:

// Instead of: return regionName;
return normalized; // lowercase + no-spaces, matches server after toLowerCase

This would make "futureregion" and "Future Region" both collapse to "futureregion", matching in the lowercased endpoint map. The tradeoff: diagnostic logs would show the normalized form instead of the user's original input.

⚠️ AI-generated review — may be incorrect. Agree? → resolve the conversation. Disagree? → reply with your reasoning.

@jeet1995 jeet1995 force-pushed the squad/region-name-mapper branch from 9b813e3 to 99a1066 Compare May 11, 2026 17:06
@jeet1995 jeet1995 force-pushed the squad/region-name-mapper branch from 99a1066 to c3b4052 Compare May 11, 2026 17:09
jeet1995 added 8 commits May 11, 2026 13:25
…ionPolicy entry, expose via getNormalizedPreferredRegions()
…RegionName, canonicalPreferredRegions, canonicalRegionNameCache, canonicalizeRegionNames
…lized in all comments across ConnectionPolicy, LocationCache, LocationHelper
…ionPolicy to raw-only, inline exclude region canonicalization (zero list allocation)
@jeet1995
Copy link
Copy Markdown
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@jeet1995
Copy link
Copy Markdown
Member Author

/azp run java - cosmos - tests

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants