Conversation
* Remove old S3 files after the events are flushed to the queue * Allow a transform that can either return nothing to be written, 1 event, or multiple events.
…entity-table-loading
…entity-table-loading
…entity-table-loading
…entity-table-loading
…entity-table-loading
* added new auto-configure Fast S3 read options to entity table read * fixed bug in ES connector where NOT saving the results would result in the bot not checkpointing.
⛔ Snyk checks have failed. 20 issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
chore: upgrade js-beautify version in leo-connector-entity-table to fix vulnerability
For some reason, the OpenSearch project client does NOT add the correct `Accept-Encoding` gzip headers by default. You have to tell it to do that by setting `suggestCompression` to `true`. Without this, large ES response bodies will take 5x - 6x longer to download and can really mess up response times and concurrency because of the added time to download the full response.
- Use a proxy to preserve OpenSearch client methods
…r-connector ES-2352 - use a proxy
It will only accept `scroll_id` now.
- added the ability to filter the stream before loading to DynamoDB (for fanout) - Added support for FastJSON parsing (pass along the whole event if it was parsed initially by FastJSON.
…d-fast-json-support ES-2352 - improvements to entity table connector
| }); | ||
| } else { | ||
| done(err); | ||
| done(err, meta); |
There was a problem hiding this comment.
Bug: Inconsistent Upload Payload Breaks Backward Compatibility
The S3 upload logic in stream and streamParallel now returns an inconsistent payload. The new Promise-based Upload.done() includes file only on success and uploadError only on failure, unlike the previous s3.upload callback that always provided both. This breaks backward compatibility for downstream consumers.
Additional Locations (1)
- updated `leo-logger` and `leo-sdk` in connectors
| if (needsComparison) { | ||
| if (JSON.stringify(data.payload.old || {}) === JSON.stringify(data.payload.new || {})) { | ||
| return resolve(null); | ||
| } |
There was a problem hiding this comment.
Bug: Data Consistency Varies by Storage Location
The needsComparison flag only triggers when data is fetched from S3, but the comparison skips events when old and new are identical. This means non-S3 records with identical old/new values still emit events, while S3-backed identical records are filtered out, creating inconsistent behavior based on storage location rather than data content.
|
|
||
| if (s3Updates.length > 0) { | ||
| logger.info(`finished writing ${s3Updates.length} records to DynamoDB`); | ||
| } |
There was a problem hiding this comment.
Bug: Log Message Falsely Claims Completion
The log message says "finished writing records to DynamoDB" but appears before the batchWrite call executes. The message should say "finished writing records to S3" or be moved after the DynamoDB write completes to accurately reflect what operation finished.
| oldS3Files.push(image._s3); | ||
| } catch (e) { | ||
| return reject(e); | ||
| } |
There was a problem hiding this comment.
Bug: S3 Deletes Referenced Files, Causing Data Loss
When an old S3 file is fetched and added to oldS3Files at line 341, but the comparison at lines 375-378 determines old and new are identical and returns null, the S3 file still gets deleted at the end. This causes data loss because the file is deleted even though it's still referenced in DynamoDB (since the event was filtered and no update occurred).
- Elasticsearch 7.x uses a read/merge/update internally when doing a `doc_as_upsert` and uses the version numbers as an optimistic locking mechanism. - This code tells ES to retry the operation again when there is a conflict, up to 3 times. - also updated the package version
- streamParallel is not currently used anywhere, but added it there anyway.
…-retries ES-2352 - add retries for conflicting updates
| lambdaConnector: function(id, lambdaName, settings) { | ||
| let region = (lambdaName.match(/arn:aws:lambda:(.*?):/) || [])[1]; | ||
| const lambdaInvoker = new aws.Lambda({ | ||
| const lambdaInvoker = new Lambda({ |
There was a problem hiding this comment.
Lambda invoke uses v2 callback pattern, never resolves
High Severity
The lambdaInvoker.invoke() call passes a callback as the second argument, but AWS SDK v3's Lambda client does not support callbacks — it returns a Promise instead. The callback will be silently ignored, and since it's wrapped in a new Promise() that only resolves/rejects inside that callback, every Lambda connector operation (init, range, nibble, getChecksum, etc.) will hang forever without resolving or rejecting.
| lambdaConnector: function(id, lambdaName, settings) { | ||
| let region = (lambdaName.match(/arn:aws:lambda:(.*?):/) || [])[1]; | ||
| const lambdaInvoker = new aws.Lambda({ | ||
| const lambdaInvoker = new Lambda({ |
There was a problem hiding this comment.
Lambda response Payload is Uint8Array in SDK v3
High Severity
In AWS SDK v3, the Payload field in a Lambda invoke response is a Uint8Array, not a string. The comparisons data.Payload != 'null' and JSON.parse(data.Payload) will not behave correctly — the string comparison against a Uint8Array will never match, and JSON.parse may not properly decode the binary payload. The payload needs conversion via Buffer.from(data.Payload).toString() or equivalent before parsing.
ES-2352 - retry 429 errors


Note
High Risk
High risk because it changes the Lambda invocation client from AWS SDK v2 to v3 in
common/checksum, and the current call pattern (lambdaInvoker.invokewith callback) may not be compatible with the v3 client at runtime.Overview
Updates
common/checksumto use AWS SDK v3 by replacingaws-sdk’sLambdaclient with@aws-sdk/client-lambdaand wiring it intolambdaConnector.Also refactors the end-of-run
saveProgress(...).then(...)flow for clearer promise handling, plus a minor formatting tweak infileConnector’s default function.Written by Cursor Bugbot for commit f2be291. This will update automatically on new commits. Configure here.