Skip to content

fix: archive WAL file once, not twice#77

Merged
mnencia merged 2 commits intocloudnative-pg:mainfrom
Erouan50:fix/wal-upload-twice
Feb 26, 2025
Merged

fix: archive WAL file once, not twice#77
mnencia merged 2 commits intocloudnative-pg:mainfrom
Erouan50:fix/wal-upload-twice

Conversation

@Erouan50
Copy link
Copy Markdown
Contributor

This commit fixes an issue in the shouldSkipWal method of the GatherReadyWALFilesConfig struct, where the current WAL file to upload is not skipped from the gathered "ready" WAL list.

We noticed the issue because the WAL archive GCP service account doesn't have permission to delete objects in our object storage.

You can see in the logs that the WAL archive attempts to upload the same WAL file twice, but with different paths:

{"level":"info","ts":"2025-02-13T20:40:32.321415133Z","logger":"wal-archive","msg":"Executing barman-cloud-wal-archive","logging_pod":"backup-test-1","walName":"/var/lib/postgresql/data/pgdata/pg_wal/00000001000000040000001D","options":["--snappy","--cloud-provider","google-cloud-storage","gs://my-backup-storage","backup-test","/var/lib/postgresql/data/pgdata/pg_wal/00000001000000040000001D"]}
{"level":"info","ts":"2025-02-13T20:40:32.321448373Z","logger":"wal-archive","msg":"Executing barman-cloud-wal-archive","logging_pod":"backup-test-1","walName":"pg_wal/00000001000000040000001D","options":["--snappy","--cloud-provider","google-cloud-storage","gs://my-backup-storage","backup-test","pg_wal/00000001000000040000001D"]}
{"level":"info","ts":"2025-02-13T20:40:33.046761842Z","logger":"wal-archive","msg":"Archived WAL file","logging_pod":"backup-test-1","walName":"pg_wal/00000001000000040000001D","startTime":"2025-02-13T20:40:32.321422342Z","endTime":"2025-02-13T20:40:33.046732582Z","elapsedWalTime":0.725310229}
{"level":"error","ts":"2025-02-13T20:40:33.052585332Z","logger":"wal-archive","msg":"Error invoking barman-cloud-wal-archive","logging_pod":"backup-test-1","walName":"/var/lib/postgresql/data/pgdata/pg_wal/00000001000000040000001D","options":["--snappy","--cloud-provider","google-cloud-storage","gs://my-backup-storage","backup-test","/var/lib/postgresql/data/pgdata/pg_wal/00000001000000040000001D"],"exitCode":-1,"error":"exit status 4","stacktrace":"github.com/cloudnative-pg/machinery/pkg/log.(*logger).Error\n\tpkg/mod/github.com/cloudnative-pg/machinery@v0.0.0-20241219102532-2807bc88310d/pkg/log/log.go:125\ngithub.com/cloudnative-pg/barman-cloud/pkg/walarchive.(*BarmanArchiver).Archive\n\tpkg/mod/github.com/cloudnative-pg/barman-cloud@v0.0.0-20241218093921-134c7de4954a/pkg/walarchive/cmd.go:83\ngithub.com/cloudnative-pg/barman-cloud/pkg/walarchive.(*BarmanArchiver).ArchiveList.func1\n\tpkg/mod/github.com/cloudnative-pg/barman-cloud@v0.0.0-20241218093921-134c7de4954a/pkg/walarchive/cmd.go:115"}
{"level":"info","ts":"2025-02-13T20:40:33.052700252Z","logger":"wal-archive","msg":"Failed archiving WAL: PostgreSQL will retry","logging_pod":"backup-test-1","walName":"/var/lib/postgresql/data/pgdata/pg_wal/00000001000000040000001D","startTime":"2025-02-13T20:40:32.321394422Z","endTime":"2025-02-13T20:40:33.052693952Z","elapsedWalTime":0.7312995,"error":"unexpected failure invoking barman-cloud-wal-archive: exit status 4"}
{"level":"info","ts":"2025-02-13T20:40:33.052745862Z","logger":"wal-archive","msg":"Completed archive command (parallel)","logging_pod":"backup-test-1","walsCount":2,"startTime":"2025-02-13T20:40:32.230839745Z","uploadStartTime":"2025-02-13T20:40:32.321364142Z","uploadTotalTime":0.73136885,"totalTime":0.821893557}

Feel free to update the code if you think the ".ready" concatenation should be placed elsewhere.

@Erouan50 Erouan50 requested a review from a team as a code owner February 13, 2025 23:08
@Erouan50 Erouan50 force-pushed the fix/wal-upload-twice branch from 131962f to 67e6221 Compare February 13, 2025 23:09
@mnencia mnencia force-pushed the fix/wal-upload-twice branch 2 times, most recently from bc59cbb to 6a2fc48 Compare February 26, 2025 11:41
@mnencia mnencia changed the title fix: Archive WAL file once, not twice fix: archive WAL file once, not twice Feb 26, 2025
Erouan50 and others added 2 commits February 26, 2025 13:31
This commit fixes an issue in the `shouldSkipWal` method of the
`GatherReadyWALFilesConfig` struct, where the current WAL file to upload
is not skipped from the gathered "ready" WAL list.

We noticed the issue because the WAL archive GCP service account doesn't
have permission to delete objects in our object storage.

You can see in the logs that the WAL archive attempts to upload the same
WAL file twice, but with different paths:

```
{"level":"info","ts":"2025-02-13T20:40:32.321415133Z","logger":"wal-archive","msg":"Executing barman-cloud-wal-archive","logging_pod":"backup-test-1","walName":"/var/lib/postgresql/data/pgdata/pg_wal/00000001000000040000001D","options":["--snappy","--cloud-provider","google-cloud-storage","gs://my-backup-storage","backup-test","/var/lib/postgresql/data/pgdata/pg_wal/00000001000000040000001D"]}
{"level":"info","ts":"2025-02-13T20:40:32.321448373Z","logger":"wal-archive","msg":"Executing barman-cloud-wal-archive","logging_pod":"backup-test-1","walName":"pg_wal/00000001000000040000001D","options":["--snappy","--cloud-provider","google-cloud-storage","gs://my-backup-storage","backup-test","pg_wal/00000001000000040000001D"]}
{"level":"info","ts":"2025-02-13T20:40:33.046761842Z","logger":"wal-archive","msg":"Archived WAL file","logging_pod":"backup-test-1","walName":"pg_wal/00000001000000040000001D","startTime":"2025-02-13T20:40:32.321422342Z","endTime":"2025-02-13T20:40:33.046732582Z","elapsedWalTime":0.725310229}
{"level":"error","ts":"2025-02-13T20:40:33.052585332Z","logger":"wal-archive","msg":"Error invoking barman-cloud-wal-archive","logging_pod":"backup-test-1","walName":"/var/lib/postgresql/data/pgdata/pg_wal/00000001000000040000001D","options":["--snappy","--cloud-provider","google-cloud-storage","gs://my-backup-storage","backup-test","/var/lib/postgresql/data/pgdata/pg_wal/00000001000000040000001D"],"exitCode":-1,"error":"exit status 4","stacktrace":"github.com/cloudnative-pg/machinery/pkg/log.(*logger).Error\n\tpkg/mod/github.com/cloudnative-pg/machinery@v0.0.0-20241219102532-2807bc88310d/pkg/log/log.go:125\ngithub.com/cloudnative-pg/barman-cloud/pkg/walarchive.(*BarmanArchiver).Archive\n\tpkg/mod/github.com/cloudnative-pg/barman-cloud@v0.0.0-20241218093921-134c7de4954a/pkg/walarchive/cmd.go:83\ngithub.com/cloudnative-pg/barman-cloud/pkg/walarchive.(*BarmanArchiver).ArchiveList.func1\n\tpkg/mod/github.com/cloudnative-pg/barman-cloud@v0.0.0-20241218093921-134c7de4954a/pkg/walarchive/cmd.go:115"}
{"level":"info","ts":"2025-02-13T20:40:33.052700252Z","logger":"wal-archive","msg":"Failed archiving WAL: PostgreSQL will retry","logging_pod":"backup-test-1","walName":"/var/lib/postgresql/data/pgdata/pg_wal/00000001000000040000001D","startTime":"2025-02-13T20:40:32.321394422Z","endTime":"2025-02-13T20:40:33.052693952Z","elapsedWalTime":0.7312995,"error":"unexpected failure invoking barman-cloud-wal-archive: exit status 4"}
{"level":"info","ts":"2025-02-13T20:40:33.052745862Z","logger":"wal-archive","msg":"Completed archive command (parallel)","logging_pod":"backup-test-1","walsCount":2,"startTime":"2025-02-13T20:40:32.230839745Z","uploadStartTime":"2025-02-13T20:40:32.321364142Z","uploadTotalTime":0.73136885,"totalTime":0.821893557}
```

Feel free to update the code if you think the ".ready" concatenation
should be placed elsewhere.

Signed-off-by: Antoine Rouaze <arouaze@mirakl.com>
Signed-off-by: Antoine Rouaze <antoine.rouaze@gmail.com>
@mnencia mnencia force-pushed the fix/wal-upload-twice branch from 6a2fc48 to b317e78 Compare February 26, 2025 12:31
@mnencia mnencia merged commit ef857fb into cloudnative-pg:main Feb 26, 2025
@Erouan50 Erouan50 deleted the fix/wal-upload-twice branch March 6, 2025 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants