Skip to content

CI Failure Doctor🏥 CI Failure Investigation - Run #1231: ABI-Breaking Change in Z3_rcf_interval Causes Windows x64 Build Failure #8049

@github-actions

Description

@github-actions

🏥 CI Failure Investigation - Run #1231

Summary

The Windows x64 CI build is failing because all regression tests fail with "Failed to start Z3: build\z3.exe". The root cause is an ABI-breaking change in PR #8046 that modified the Z3_rcf_interval function signature by changing parameter types from int* to bool*. On Windows (MSVC), bool (1 byte) and int (4 bytes) have different sizes and calling conventions, causing the z3.exe executable to fail at runtime despite compiling successfully.

Failure Details

Root Cause Analysis

ABI-Breaking Change in API

The commit changed Z3_rcf_interval function signature from:

int Z3_API Z3_rcf_interval(Z3_context c, Z3_rcf_num a,
    int * lower_is_inf, int * lower_is_open, Z3_rcf_num * lower,
    int * upper_is_inf, int * upper_is_open, Z3_rcf_num * upper);

To:

int Z3_API Z3_rcf_interval(Z3_context c, Z3_rcf_num a,
    bool * lower_is_inf, bool * lower_is_open, Z3_rcf_num * lower,
    bool * upper_is_inf, bool * upper_is_open, Z3_rcf_num * upper);

Why This Breaks Windows Builds

On Windows with MSVC:

  • sizeof(bool) = 1 byte
  • sizeof(int) = 4 bytes

When parameters change from int* to bool*, the calling convention and ABI change. If any compiled binaries, DLLs, or cached object files have the old signature baked in, they will:

  1. Pass wrong-sized parameters leading to stack corruption
  2. Access memory incorrectly when writing to bool* as if it were int*
  3. Crash during initialization when setting up function pointers or dynamic linking

Error Pattern

Testing z3test\\regressions\\smt2\\*.smt2
Failed
Failed to start Z3: build\\z3.exe
Exception: Found errors testing benchmarks at z3test\\regressions\\smt2 using build\\z3.exe
##[error]Process completed with exit code 1.

100% of tests failed with the same error, indicating z3.exe crashes immediately on startup before any test logic runs.

Failed Jobs and Errors

  • build (x64) - FAILURE (all 1000+ regression tests fail)
  • ⚠️ build (x86) - CANCELLED (due to x64 failure)
  • build (amd64_arm64) - SUCCESS

Platform Impact Analysis

Platform Result Analysis
Windows x64 ❌ FAILURE Affected by ABI break
Windows x86 ⚠️ CANCELLED Would likely fail
Windows amd64_arm64 ✅ SUCCESS Cross-compile, may skip affected code

Investigation Findings

Build vs Runtime Failure

  • Compilation: Succeeds (~47 minutes)
  • Linking: No errors
  • Runtime: z3.exe crashes immediately on startup
  • 📊 Test Impact: 100% failure rate (ALL tests fail identically)

Files Modified in PR #8046

  1. src/api/api_rcf.cpp (1 addition, 1 deletion)
  2. src/api/z3_rcf.h (2 additions, 2 deletions) - API signature change here
  3. src/math/realclosure/realclosure.cpp (2 additions, 2 deletions)
  4. src/math/realclosure/realclosure.h (1 addition, 1 deletion)

The PR correctly updated the internal implementation to use bool consistently, but this creates an ABI incompatibility on Windows when parameter types change in public API functions.

PR Author's Note

From PR #8046 description:

"No worries if you don't want to take API-breaking changes like this, it's not hard to work around."

The author acknowledged this was an API-breaking change, suggesting they expected potential compatibility issues.

Recommended Actions

Immediate Fix (P0 - Critical)

  • Option 1 - Revert the API change: Keep int* in the public API signature, convert internally to bool

    int Z3_API Z3_rcf_interval(Z3_context c, Z3_rcf_num a,
        int * lower_is_inf, int * lower_is_open, Z3_rcf_num * lower,
        int * upper_is_inf, int * upper_is_open, Z3_rcf_num * upper) {
        bool b_lower_inf, b_lower_open, b_upper_inf, b_upper_open;
        int result = internal_implementation(&b_lower_inf, &b_lower_open, ..., &b_upper_inf, &b_upper_open);
        *lower_is_inf = b_lower_inf ? 1 : 0;
        *lower_is_open = b_lower_open ? 1 : 0;
        // ... same for upper_is_inf, upper_is_open
        return result;
    }
  • Option 2 - Force clean rebuild: Ensure ALL object files are rebuilt with new signature

    • Clear all build caches
    • Delete all intermediate build artifacts
    • Rebuild from scratch
  • Option 3 - Major version bump: Accept this as an intentional ABI break and document it

    • Bump major version (e.g., 4.x → 5.0)
    • Add migration guide for users
    • Ensure all downstream packages are rebuilt

Investigation Steps (P1 - High)

  • Check for cached artifacts: Determine if incremental builds are causing old symbols to persist
  • Verify DLL exports: Use dumpbin /EXPORTS z3.dll to check exported function signatures
  • Test with clean build: Perform completely clean Windows x64 build locally

Long-term Prevention (P2 - Medium)

  • ABI stability policy: Document rules for changing public API signatures
  • API versioning: Use versioned function names for ABI-breaking changes (Z3_rcf_interval_v2)
  • CI improvements: Add smoke test (z3.exe --version) before running full regression suite

Prevention Strategies

API Change Guidelines

  1. Never change parameter types in public APIs without major version bump
  2. Especially avoid intbool conversions (different sizes on different platforms)
  3. Use wrapper functions to maintain ABI compatibility:
    • Keep old signature as wrapper
    • Call new internal implementation
    • Convert types at the boundary

CI/CD Enhancements

  1. Early failure detection: Run z3.exe --version smoke test after build
  2. Clean builds: Ensure Windows CI doesn't use cached artifacts for API changes
  3. ABI compatibility checking: Add tools to detect ABI-breaking changes

AI Team Self-Improvement

Add these guidelines to prevent similar failures:

### Public API Modifications - CRITICAL

When modifying functions in header files under `src/api/`:

#### Never Do This
- ❌ Change parameter types (especially `int``bool`, different sizes!)
- ❌ Add/remove parameters from existing functions
- ❌ Change return types
- ❌ Modify struct layouts

#### Always Do This
- ✅ Create new function with `_v2` suffix for incompatible changes
- ✅ Keep old function as wrapper calling new implementation
- ✅ Convert types at API boundary to maintain compatibility
- ✅ Test on Windows x64 (ABI differences matter here)

#### Type Size Compatibility on Windows (MSVC)
- `bool`: 1 byte
- `int`: 4 bytes  
- `long`: 4 bytes
- `long long`: 8 bytes
- Changing pointer types (`int*``bool*`) **breaks ABI**

#### Example - Safe API Evolution
```c
// Old function - keep as wrapper
int Z3_API Z3_rcf_interval(Z3_context c, Z3_rcf_num a,
    int * lower_is_inf, int * lower_is_open, Z3_rcf_num * lower,
    int * upper_is_inf, int * upper_is_open, Z3_rcf_num * upper) {
    bool b_lower_inf, b_lower_open, b_upper_inf, b_upper_open;
    int result = Z3_rcf_interval_v2(c, a, &b_lower_inf, &b_lower_open, lower,
                                     &b_upper_inf, &b_upper_open, upper);
    *lower_is_inf = b_lower_inf;
    *lower_is_open = b_lower_open;
    *upper_is_inf = b_upper_inf;
    *upper_is_open = b_upper_open;
    return result;
}

// New function with better types
int Z3_API Z3_rcf_interval_v2(Z3_context c, Z3_rcf_num a,
    bool * lower_is_inf, bool * lower_is_open, Z3_rcf_num * lower,
    bool * upper_is_inf, bool * upper_is_open, Z3_rcf_num * upper);

## Historical Context

### Similar Past Failures
This is a **recurring pattern** in Z3:
- **Issue #8014**: Windows x64 "Failed to start Z3" due to initialization issues
- **Issue #7992**: Windows x64 crashes from in-class initializers breaking MSVC ABI
- **Issue #7940**: Windows x64 z3.exe cannot start after CMakeLists.txt changes

### Pattern Recognition
1. Change compiles successfully
2. Windows x64 build links without errors  
3. z3.exe crashes immediately at runtime (before any user code runs)
4. ALL tests fail with "Failed to start Z3"
5. Root cause: ABI incompatibility or initialization order issue

## Additional Context

### Why ABI Matters
ABI (Application Binary Interface) compatibility is critical for:
- **Dynamic linking**: DLLs must match calling conventions
- **Incremental builds**: New code must interface correctly with existing object files
- **Downstream users**: Pre-compiled binaries must work with new library versions

### Windows-Specific Considerations
Windows MSVC has **stricter ABI requirements** than Linux GCC/Clang:
- Different calling conventions (cdecl, stdcall, fastcall)
- Different type sizes and alignments
- Name mangling differences for C++ vs C APIs

---

> AI-generated content by [CI Failure Doctor](https://github.com/Z3Prover/z3/actions/runs/19690977660) may contain mistakes.


> Generated by Agentic Workflow [Run](https://github.com/Z3Prover/z3/actions/runs/19690977660)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions