Skip to content

Aws Sdk V3#206

Open
czirker wants to merge 81 commits intodevelopmentfrom
feature/aws-sdk-v3-again
Open

Aws Sdk V3#206
czirker wants to merge 81 commits intodevelopmentfrom
feature/aws-sdk-v3-again

Conversation

@czirker
Copy link

@czirker czirker commented Jan 31, 2024

Note

High Risk
High risk because it changes the Lambda invocation client from AWS SDK v2 to v3 in common/checksum, and the current call pattern (lambdaInvoker.invoke with callback) may not be compatible with the v3 client at runtime.

Overview
Updates common/checksum to use AWS SDK v3 by replacing aws-sdk’s Lambda client with @aws-sdk/client-lambda and wiring it into lambdaConnector.

Also refactors the end-of-run saveProgress(...).then(...) flow for clearer promise handling, plus a minor formatting tweak in fileConnector’s default function.

Written by Cursor Bugbot for commit f2be291. This will update automatically on new commits. Configure here.

czirker and others added 30 commits January 31, 2024 13:52
* Remove old S3 files after the events are flushed to the queue
* Allow a transform that can either return nothing to be written, 1 event, or multiple events.
jgrantr added 2 commits May 6, 2025 16:12
* added new auto-configure Fast S3 read options to entity table read
* fixed bug in ES connector where NOT saving the results would result in the bot not checkpointing.
@ch-snyk-sa
Copy link

ch-snyk-sa commented May 7, 2025

Snyk checks have failed. 20 issues have been found so far.

Status Scan Engine Critical High Medium Low Total (20)
Open Source Security 0 10 10 0 20 issues
Licenses 0 0 0 0 0 issues
Code Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

mariagr-chub and others added 15 commits May 12, 2025 14:46
chore: upgrade js-beautify version  in  leo-connector-entity-table to fix vulnerability
For some reason, the OpenSearch project client does NOT add the correct `Accept-Encoding` gzip headers by default. You have to tell it to do that by setting `suggestCompression` to `true`. Without this, large ES response bodies will take 5x - 6x longer to download and can really mess up response times and concurrency because of the added time to download the full response.
- Use a proxy to preserve OpenSearch client methods
It will only accept `scroll_id` now.
- added the ability to filter the stream before loading to DynamoDB (for fanout)
- Added support for FastJSON parsing (pass along the whole event if it was parsed initially by FastJSON.
…d-fast-json-support

ES-2352 - improvements to entity table connector
});
} else {
done(err);
done(err, meta);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Inconsistent Upload Payload Breaks Backward Compatibility

The S3 upload logic in stream and streamParallel now returns an inconsistent payload. The new Promise-based Upload.done() includes file only on success and uploadError only on failure, unlike the previous s3.upload callback that always provided both. This breaks backward compatibility for downstream consumers.

Additional Locations (1)

Fix in Cursor Fix in Web

- updated `leo-logger` and `leo-sdk` in connectors
if (needsComparison) {
if (JSON.stringify(data.payload.old || {}) === JSON.stringify(data.payload.new || {})) {
return resolve(null);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Data Consistency Varies by Storage Location

The needsComparison flag only triggers when data is fetched from S3, but the comparison skips events when old and new are identical. This means non-S3 records with identical old/new values still emit events, while S3-backed identical records are filtered out, creating inconsistent behavior based on storage location rather than data content.

Fix in Cursor Fix in Web


if (s3Updates.length > 0) {
logger.info(`finished writing ${s3Updates.length} records to DynamoDB`);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Log Message Falsely Claims Completion

The log message says "finished writing records to DynamoDB" but appears before the batchWrite call executes. The message should say "finished writing records to S3" or be moved after the DynamoDB write completes to accurately reflect what operation finished.

Fix in Cursor Fix in Web

oldS3Files.push(image._s3);
} catch (e) {
return reject(e);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: S3 Deletes Referenced Files, Causing Data Loss

When an old S3 file is fetched and added to oldS3Files at line 341, but the comparison at lines 375-378 determines old and new are identical and returns null, the S3 file still gets deleted at the end. This causes data loss because the file is deleted even though it's still referenced in DynamoDB (since the event was filtered and no update occurred).

Fix in Cursor Fix in Web

- Elasticsearch 7.x uses a read/merge/update internally when doing a `doc_as_upsert` and uses the version numbers as an optimistic locking mechanism.
- This code tells ES to retry the operation again when there is a conflict, up to 3 times.
- also updated the package version
- streamParallel is not currently used anywhere, but added it there anyway.
…-retries

ES-2352 - add retries for conflicting updates
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

lambdaConnector: function(id, lambdaName, settings) {
let region = (lambdaName.match(/arn:aws:lambda:(.*?):/) || [])[1];
const lambdaInvoker = new aws.Lambda({
const lambdaInvoker = new Lambda({
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lambda invoke uses v2 callback pattern, never resolves

High Severity

The lambdaInvoker.invoke() call passes a callback as the second argument, but AWS SDK v3's Lambda client does not support callbacks — it returns a Promise instead. The callback will be silently ignored, and since it's wrapped in a new Promise() that only resolves/rejects inside that callback, every Lambda connector operation (init, range, nibble, getChecksum, etc.) will hang forever without resolving or rejecting.

Fix in Cursor Fix in Web

lambdaConnector: function(id, lambdaName, settings) {
let region = (lambdaName.match(/arn:aws:lambda:(.*?):/) || [])[1];
const lambdaInvoker = new aws.Lambda({
const lambdaInvoker = new Lambda({
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lambda response Payload is Uint8Array in SDK v3

High Severity

In AWS SDK v3, the Payload field in a Lambda invoke response is a Uint8Array, not a string. The comparisons data.Payload != 'null' and JSON.parse(data.Payload) will not behave correctly — the string comparison against a Uint8Array will never match, and JSON.parse may not properly decode the binary payload. The payload needs conversion via Buffer.from(data.Payload).toString() or equivalent before parsing.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants