fix: Bulk export fails due to S3 upload#10775
fix: Bulk export fails due to S3 upload#10775ryotaro-nagahara wants to merge 1 commit intomasterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes bulk export failures in AWS environments by making the tar.gz upload pipeline compatible with AWS SDK stream handling, improving upload robustness for unknown-length streams, and ensuring the bulk export job flow properly awaits async steps and handles errors.
Changes:
- Wrap the archiver stream with a Node.js
PassThroughbefore uploading to avoid AWS SDK stream type checks failing. - Switch AWS S3 upload implementation from
PutObjectCommandto@aws-sdk/lib-storageUploadfor safer streaming/multipart uploads. - Fix missing
await/async error handling in the bulk export cron flow to prevent post-processing after failures.
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pnpm-lock.yaml | Adds @aws-sdk/lib-storage (and transitive deps) to support streaming uploads. |
| apps/app/package.json | Adds @aws-sdk/lib-storage dependency for AWS streaming upload support. |
| apps/app/src/server/service/file-uploader/aws/index.ts | Uses Upload class for S3 streaming uploads and adds timeout-based abort. |
| apps/app/src/features/page-bulk-export/server/service/page-bulk-export-job-cron/steps/compress-and-upload.ts | Wraps archiver output in PassThrough and awaits error handling to stop post-processing on failure. |
| apps/app/src/features/page-bulk-export/server/service/page-bulk-export-job-cron/index.ts | Makes handleError async in the interface and awaits compressAndUpload during the cron step. |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| handleError( | ||
| err: Error | null, | ||
| pageBulkExportJob: PageBulkExportJobDocument, | ||
| ): void; | ||
| ): Promise<void>; |
There was a problem hiding this comment.
handleError was changed to return Promise<void>, but there are existing call sites that invoke it from non-async callbacks without awaiting or catching the returned promise (e.g. in steps/create-page-snapshots-async.ts and steps/export-pages-to-fs-async.ts pipeline callbacks). This can lead to unhandled promise rejections and missed cleanup/notification. Update those call sites to explicitly handle the promise (e.g., void this.handleError(...).catch(logger.error) or similar).
| PutObjectCommand, | ||
| S3Client, | ||
| } from '@aws-sdk/client-s3'; | ||
| import { Upload } from '@aws-sdk/lib-storage'; | ||
| import { getSignedUrl } from '@aws-sdk/s3-request-presigner'; |
There was a problem hiding this comment.
PutObjectCommand is still imported but no longer used after switching to @aws-sdk/lib-storage's Upload. This will typically fail CI/build when unused imports are checked, so please remove it from the import list.
| pageArchiver.finalize(); | ||
| this.setStreamsInExecution(pageBulkExportJob._id, pageArchiver); | ||
|
|
||
| // Wrap archiver with PassThrough (Node.js built-in) to ensure instanceof Readable check passes in AWS SDK | ||
| const uploadStream = new PassThrough(); | ||
| pageArchiver.pipe(uploadStream); | ||
|
|
||
| try { | ||
| await fileUploadService.uploadAttachment(pageArchiver, attachment); | ||
| await fileUploadService.uploadAttachment(uploadStream, attachment); |
There was a problem hiding this comment.
setStreamsInExecution currently tracks only pageArchiver, but the actual upload now consumes uploadStream (PassThrough). If a job is cleaned up/aborted, cleanUpExportJobResources will destroy only the tracked streams, so the S3 upload may keep waiting on uploadStream. Include uploadStream in setStreamsInExecution (and consider destroying both streams on upload failure) so cleanup reliably terminates the upload pipeline.
| this.setStreamsInExecution(pageBulkExportJob._id, pageArchiver); | ||
|
|
||
| // Wrap archiver with PassThrough (Node.js built-in) to ensure instanceof Readable check passes in AWS SDK | ||
| const uploadStream = new PassThrough(); |
There was a problem hiding this comment.
When piping pageArchiver into uploadStream, errors from the source stream are not automatically forwarded to the destination. Add explicit 'error' handling (e.g., destroy uploadStream when pageArchiver errors, and handle uploadStream errors) to avoid hangs and unhandled error events during compression/upload failures.
| const uploadStream = new PassThrough(); | |
| const uploadStream = new PassThrough(); | |
| // Forward errors from pageArchiver to uploadStream and avoid unhandled error events | |
| pageArchiver.on('error', (err) => { | |
| logger.error(err); | |
| uploadStream.destroy(err); | |
| }); | |
| uploadStream.on('error', (err) => { | |
| logger.error(err); | |
| }); |
Task
https://redmine.weseek.co.jp/issues/178419
[Bulk Export] pdf-converter が AWS との組み合わせで発生している不具合を修正する
症状
AWS 環境で bulk export(Markdown/PDF 両方)が失敗し、ダウンロード時に NoSuchKey エラーが発生していた。GCS 環境では正常に動作。また、ログを出さないため原因が読めなかった。
バグの原因
PDFエクスポート>フロー概要図ステップ14に続く、「PDFをストレージにアップロード(tar.gz化)」で、tar.gz圧縮データを生成する archiver のストリームを AWS SDK に渡した時点で型チェックに失敗し、データが一切流れないまま即座にエラーになっていた。
これは、archiver と AWS SDK が異なる Readable クラスを参照しているため、AWS SDK が archiver のストリームをストリームとして認識できていなかったことによる。GCS の実装はこの影響を受けないため、GCS 環境では正常に動作する。
変更点
バグ対応
compress-and-upload.ts: archiver ストリームを PassThrough でラップarchiver のストリームを、AWS SDK が認識できるストリーム(PassThrough)に中継させることで、instanceof Readable チェックを通過させた。
### 警告に対する対応2.aws/index.ts: PutObjectCommand → Upload クラスに変更@aws-sdk/lib-storage の Upload クラスに変更し、未知サイズのストリームを安全にマルチパートアップロードできるようにした。### エラーハンドリングの修正3.index.ts: compressAndUpload に await 追加他の2つの非同期処理(createPageSnapshotsAsync, exportPagesToFsAsync)は await されていたがcompressAndUpload のみ await が欠落していた。4.compress-and-upload.ts / index.ts: エラーハンドリング修正handleError の await 追加および戻り値型を Promise に修正エラー時に return を追加し、postProcess が実行されないようにした