Skip to content

fix: Bulk export fails due to S3 upload#10775

Closed
ryotaro-nagahara wants to merge 1 commit intomasterfrom
fix/178290-bulk-export-aws-problem
Closed

fix: Bulk export fails due to S3 upload#10775
ryotaro-nagahara wants to merge 1 commit intomasterfrom
fix/178290-bulk-export-aws-problem

Conversation

@ryotaro-nagahara
Copy link

@ryotaro-nagahara ryotaro-nagahara commented Feb 10, 2026

Task

https://redmine.weseek.co.jp/issues/178419
[Bulk Export] pdf-converter が AWS との組み合わせで発生している不具合を修正する

症状

AWS 環境で bulk export(Markdown/PDF 両方)が失敗し、ダウンロード時に NoSuchKey エラーが発生していた。GCS 環境では正常に動作。また、ログを出さないため原因が読めなかった。

バグの原因

PDFエクスポート>フロー概要図ステップ14に続く、「PDFをストレージにアップロード(tar.gz化)」で、tar.gz圧縮データを生成する archiver のストリームを AWS SDK に渡した時点で型チェックに失敗し、データが一切流れないまま即座にエラーになっていた。

これは、archiver と AWS SDK が異なる Readable クラスを参照しているため、AWS SDK が archiver のストリームをストリームとして認識できていなかったことによる。GCS の実装はこの影響を受けないため、GCS 環境では正常に動作する。

変更点

バグ対応

  1. compress-and-upload.ts: archiver ストリームを PassThrough でラップ
    archiver のストリームを、AWS SDK が認識できるストリーム(PassThrough)に中継させることで、instanceof Readable チェックを通過させた。

### 警告に対する対応
2. aws/index.ts: PutObjectCommand → Upload クラスに変更
@aws-sdk/lib-storage の Upload クラスに変更し、未知サイズのストリームを安全にマルチパートアップロードできるようにした。

### エラーハンドリングの修正
3. index.ts: compressAndUpload に await 追加
他の2つの非同期処理(createPageSnapshotsAsync, exportPagesToFsAsync)は await されていたがcompressAndUpload のみ await が欠落していた。

4. compress-and-upload.ts / index.ts: エラーハンドリング修正
handleError の await 追加および戻り値型を Promise に修正
エラー時に return を追加し、postProcess が実行されないようにした

@ryotaro-nagahara ryotaro-nagahara marked this pull request as ready for review February 10, 2026 10:02
@miya miya requested a review from Copilot February 12, 2026 05:09
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes bulk export failures in AWS environments by making the tar.gz upload pipeline compatible with AWS SDK stream handling, improving upload robustness for unknown-length streams, and ensuring the bulk export job flow properly awaits async steps and handles errors.

Changes:

  • Wrap the archiver stream with a Node.js PassThrough before uploading to avoid AWS SDK stream type checks failing.
  • Switch AWS S3 upload implementation from PutObjectCommand to @aws-sdk/lib-storage Upload for safer streaming/multipart uploads.
  • Fix missing await/async error handling in the bulk export cron flow to prevent post-processing after failures.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pnpm-lock.yaml Adds @aws-sdk/lib-storage (and transitive deps) to support streaming uploads.
apps/app/package.json Adds @aws-sdk/lib-storage dependency for AWS streaming upload support.
apps/app/src/server/service/file-uploader/aws/index.ts Uses Upload class for S3 streaming uploads and adds timeout-based abort.
apps/app/src/features/page-bulk-export/server/service/page-bulk-export-job-cron/steps/compress-and-upload.ts Wraps archiver output in PassThrough and awaits error handling to stop post-processing on failure.
apps/app/src/features/page-bulk-export/server/service/page-bulk-export-job-cron/index.ts Makes handleError async in the interface and awaits compressAndUpload during the cron step.
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 47 to +50
handleError(
err: Error | null,
pageBulkExportJob: PageBulkExportJobDocument,
): void;
): Promise<void>;
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handleError was changed to return Promise<void>, but there are existing call sites that invoke it from non-async callbacks without awaiting or catching the returned promise (e.g. in steps/create-page-snapshots-async.ts and steps/export-pages-to-fs-async.ts pipeline callbacks). This can lead to unhandled promise rejections and missed cleanup/notification. Update those call sites to explicitly handle the promise (e.g., void this.handleError(...).catch(logger.error) or similar).

Copilot uses AI. Check for mistakes.
Comment on lines 13 to 17
PutObjectCommand,
S3Client,
} from '@aws-sdk/client-s3';
import { Upload } from '@aws-sdk/lib-storage';
import { getSignedUrl } from '@aws-sdk/s3-request-presigner';
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PutObjectCommand is still imported but no longer used after switching to @aws-sdk/lib-storage's Upload. This will typically fail CI/build when unused imports are checked, so please remove it from the import list.

Copilot uses AI. Check for mistakes.
Comment on lines 80 to +88
pageArchiver.finalize();
this.setStreamsInExecution(pageBulkExportJob._id, pageArchiver);

// Wrap archiver with PassThrough (Node.js built-in) to ensure instanceof Readable check passes in AWS SDK
const uploadStream = new PassThrough();
pageArchiver.pipe(uploadStream);

try {
await fileUploadService.uploadAttachment(pageArchiver, attachment);
await fileUploadService.uploadAttachment(uploadStream, attachment);
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setStreamsInExecution currently tracks only pageArchiver, but the actual upload now consumes uploadStream (PassThrough). If a job is cleaned up/aborted, cleanUpExportJobResources will destroy only the tracked streams, so the S3 upload may keep waiting on uploadStream. Include uploadStream in setStreamsInExecution (and consider destroying both streams on upload failure) so cleanup reliably terminates the upload pipeline.

Copilot uses AI. Check for mistakes.
this.setStreamsInExecution(pageBulkExportJob._id, pageArchiver);

// Wrap archiver with PassThrough (Node.js built-in) to ensure instanceof Readable check passes in AWS SDK
const uploadStream = new PassThrough();
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When piping pageArchiver into uploadStream, errors from the source stream are not automatically forwarded to the destination. Add explicit 'error' handling (e.g., destroy uploadStream when pageArchiver errors, and handle uploadStream errors) to avoid hangs and unhandled error events during compression/upload failures.

Suggested change
const uploadStream = new PassThrough();
const uploadStream = new PassThrough();
// Forward errors from pageArchiver to uploadStream and avoid unhandled error events
pageArchiver.on('error', (err) => {
logger.error(err);
uploadStream.destroy(err);
});
uploadStream.on('error', (err) => {
logger.error(err);
});

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants