Skip to content

fix(queries): check rows.Err() after Next() loop in bind()#1472

Open
marcoabreu wants to merge 2 commits intoaarondl:masterfrom
marcoabreu:fix-next-swallow-errors
Open

fix(queries): check rows.Err() after Next() loop in bind()#1472
marcoabreu wants to merge 2 commits intoaarondl:masterfrom
marcoabreu:fix-next-swallow-errors

Conversation

@marcoabreu
Copy link
Copy Markdown

@marcoabreu marcoabreu commented Apr 2, 2026

Summary

bind() in queries/reflect.go and schema introspection functions across all four database drivers do not check rows.Err() after rows.Next() loops. When rows.Next() returns false due to any error — connection resets, context cancellation, query timeouts, TLS failures, etc. — these functions either return sql.ErrNoRows or silently return partial/empty results instead of the real error.

Ref: #1324 (comment)

The bug

Per the database/sql documentation:

Next prepares the next result row for reading with the Scan method. It returns true on success, or false if there is no next result row or an error happened while preparing it. Err should be consulted to distinguish between the two cases.

Runtime query path (bind()): After the rows.Next() loop, bind() checks !foundOne and returns sql.ErrNoRows without first consulting rows.Err(). The calling Bind() method only checks rows.Err() on the success path — on the error path, the real error is permanently lost. This affects all bind kinds: struct (returns wrong sql.ErrNoRows), slice and pointer slice (silently returns partial results).

Code generation path (all drivers): Schema introspection functions (TableNames, ViewNames, Columns, etc.) iterate rows.Next() but never check rows.Err(), silently returning incomplete schema information on connection errors.

The fix

Added rows.Err() checks after every rows.Next() loop that was missing one:

queries/reflect.gobind():

if err := rows.Err(); err != nil {
    return errors.Wrap(err, "failed to iterate rows")
}

All four drivers — same pattern in each affected function:

if err := rows.Err(); err != nil {
    return nil, err
}
Driver Fixed functions
PostgreSQL TableNames, ViewNames, Columns, loadUniqueColumns
MySQL TableNames, ViewNames, Columns
MSSQL TableNames, ViewNames, Columns
SQLite3 TableNames, ViewNames, tableInfo, indexes (nested loops — both inner and outer)

The fix is backwards-compatible: callers using errors.Is(err, sql.ErrNoRows) will no longer match on real errors, which is the correct behavior.

Production impact

We discovered this bug in a production identity management platform running on GCP Cloud Run + Cloud SQL (PostgreSQL). Any error occurring between query dispatch and row iteration — Cloud SQL proxy restarts, context cancellations, TCP resets, query timeouts — was silently converted to sql.ErrNoRows. Applications treating "not found" as a permanent/non-retryable condition permanently dropped millions of messages per day and assumed query results to be empty when they actually had an underlying error, causing correctness failures across the system.

Test plan

18 new tests using go-sqlmock with RowError() to simulate mid-iteration connection failures:

  • queries/: TestBindStructRowsError, TestBindSliceRowsError, TestBindPtrSliceRowsError
  • psql driver: TestTableNames_RowsErr, TestViewNames_RowsErr, TestLoadUniqueColumns_RowsErr, TestColumns_RowsErr
  • mysql driver: TestTableNames_RowsErr, TestViewNames_RowsErr, TestColumns_RowsErr
  • mssql driver: TestTableNames_RowsErr, TestViewNames_RowsErr, TestColumns_RowsErr
  • sqlite3 driver: TestTableNames_RowsErr, TestViewNames_RowsErr, TestTableInfo_RowsErr, TestIndexes_OuterRowsErr, TestIndexes_InnerRowsErr
  • All existing tests pass with zero regressions
  • Final audit confirms zero remaining unchecked .Next() loops in the codebase

bind() did not check rows.Err() after the rows.Next() loop. When
rows.Next() returned false due to a connection error (not an empty
result set), bind() returned sql.ErrNoRows — silently swallowing
the real error.

This caused callers that distinguish "not found" from "transient
failure" to make wrong decisions. In production, intermittent
connection errors were misclassified as "row not found", causing
permanent data loss in retry-based message processing systems.

The fix adds a rows.Err() check after the loop, before the
foundOne/ErrNoRows check. This covers all bind kinds (struct,
slice, ptr slice) uniformly and is backwards-compatible.
Same bug pattern as the bind() fix: rows.Next() loops in driver
schema introspection functions did not check rows.Err() afterward.
Per the database/sql contract, Next() returns false both when the
result set is exhausted and when an error occurs — Err() must be
consulted to distinguish the two cases.

Without this check, a connection error during code generation would
silently produce incomplete schema information (missing tables,
columns, or indexes), leading to incorrect generated code.

Affected functions across all four drivers:
- PostgreSQL: TableNames, ViewNames, Columns, loadUniqueColumns
- MySQL: TableNames, ViewNames, Columns
- MSSQL: TableNames, ViewNames, Columns
- SQLite3: TableNames, ViewNames, tableInfo, indexes (nested loops)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant