Skip to content

Fix LSI dimension mismatch with native Ruby SVD#78

Merged
cardmagic merged 1 commit into
masterfrom
fix/lsi-dimension-mismatch
Dec 27, 2025
Merged

Fix LSI dimension mismatch with native Ruby SVD#78
cardmagic merged 1 commit into
masterfrom
fix/lsi-dimension-mismatch

Conversation

@cardmagic

Copy link
Copy Markdown
Owner

Summary

  • Fix ExceptionForMatrix::ErrDimensionMismatch when classifying with 10+ similar documents using native Ruby SVD
  • Transpose reduced matrix when native SVD returns swapped dimensions
  • Iterate over columns (documents) instead of rows (terms) in build_index

Root Cause

Native Ruby SVD returns transposed dimensions when row_size < column_size (common case: few unique terms, many documents). This caused query vectors (5 elements) to be compared against document vectors (20 elements).

Test Plan

Fixes #72

@cardmagic cardmagic requested a review from Copilot December 27, 2025 09:47
@cardmagic cardmagic self-assigned this Dec 27, 2025

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a dimension mismatch error (ExceptionForMatrix::ErrDimensionMismatch) that occurs when classifying documents using native Ruby SVD with 10+ similar documents. The root cause is that Ruby's native SVD implementation returns transposed dimensions when there are fewer unique terms than documents, causing query vectors to have incompatible dimensions with document vectors during comparison.

Key Changes:

  • Fixed iteration to use column_size instead of row_size when building the document index
  • Added dimension validation and transposition logic to handle native Ruby SVD's behavior
  • Ensured consistent matrix dimensions between GSL and native Ruby implementations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Native Ruby SVD returns transposed matrix dimensions when
row_size < column_size (common case: few terms, many documents).
This caused ExceptionForMatrix::ErrDimensionMismatch during
classification with 10+ similar documents.

Two changes:
- Transpose reduced matrix when dimensions don't match input
- Iterate over column_size (documents) not row_size (terms)

Fixes #72
@cardmagic cardmagic force-pushed the fix/lsi-dimension-mismatch branch from 37ae2e3 to 4b67850 Compare December 27, 2025 09:49
@cardmagic cardmagic merged commit 32fffc5 into master Dec 27, 2025
5 checks passed
@cardmagic cardmagic deleted the fix/lsi-dimension-mismatch branch December 27, 2025 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LSI dimension mismatch with large similar document sets

2 participants