[backend/pycti] Prevent file upload looping (#14585)#14749
[backend/pycti] Prevent file upload looping (#14585)#14749richard-julien wants to merge 1 commit intomasterfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #14749 +/- ##
=======================================
Coverage 32.37% 32.37%
=======================================
Files 3098 3098
Lines 211027 211029 +2
Branches 38244 38245 +1
=======================================
+ Hits 68312 68317 +5
+ Misses 142715 142712 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR addresses issue #14585 where playbook-driven “knowledge manipulation” updates on containers inadvertently delete/re-add existing attached files, causing noisy history entries and unnecessary file operations.
Changes:
- Backend: when the Container Wrapper component is configured to copy files, it now attempts to embed file content (
data) into the STIX file extensions instead of copying onlyurireferences. - PyCTI: removes the “fetch file by
uri” fallback during STIX import, so files are only uploaded whendatais explicitly embedded.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| opencti-platform/opencti-graphql/src/modules/playbook/playbook-components.ts | Updates container wrapping to resolve and embed file contents when copyFiles is enabled. |
| client-python/pycti/utils/opencti_stix2.py | Prevents implicit file re-download/re-upload by requiring embedded data for file import. |
| const currentFileContent = toBase64(await getFileContent(fileId)); | ||
| copiedFiles.push({ ...currentFile, data: currentFileContent }); |
There was a problem hiding this comment.
getFileContent(fileId) defaults to UTF-8 decoding, then toBase64() re-encodes the resulting string. For binary files (e.g., PDFs) this will corrupt the content. Use getFileContent(fileId, 'base64') (as done in database/middleware.ts:631-634) and assign that directly to data (handling undefined).
| const currentFileContent = toBase64(await getFileContent(fileId)); | |
| copiedFiles.push({ ...currentFile, data: currentFileContent }); | |
| const currentFileContent = await getFileContent(fileId, 'base64'); | |
| if (currentFileContent) { | |
| copiedFiles.push({ ...currentFile, data: currentFileContent }); | |
| } else { | |
| logApp.error('Cant copy file from main element to the container: empty content', { name: currentFile.name }); | |
| } |
| const stixFileExtensions = baseData.extensions[STIX_EXT_OCTI].files; | ||
| if (copyFiles && stixFileExtensions && stixFileExtensions.length > 0) { | ||
| // We need to get the files and add the data inside | ||
| const copiedFiles = []; | ||
| for (let index = 0; index < stixFileExtensions.length; index += 1) { | ||
| const currentFile = stixFileExtensions[index]; | ||
| try { | ||
| const currentFileUri = currentFile.uri; | ||
| const fileId = currentFileUri.replace('/storage/get/', ''); | ||
| const currentFileContent = toBase64(await getFileContent(fileId)); | ||
| copiedFiles.push({ ...currentFile, data: currentFileContent }); | ||
| } catch (e) { | ||
| logApp.error('Cant copy file from main element to the container', { cause: e, name: currentFile.name }); | ||
| } | ||
| } | ||
| container.extensions[STIX_EXT_OCTI].files = copiedFiles; |
There was a problem hiding this comment.
The current loop drops a file entirely if it already has data (since its uri can be 'unknown') or if getFileContent fails, because only successfully fetched files are pushed to copiedFiles. To avoid unintended file removals, preserve existing currentFile.data when present and/or push the original currentFile into copiedFiles on failure (optionally marking it no_trigger_import).
| for file_obj in ext_ref_files: | ||
| data = None | ||
| if "data" in file_obj: | ||
| data = base64.b64decode(file_obj["data"]) | ||
| elif "uri" in file_obj: | ||
| file_url = self.opencti.api_url.replace( | ||
| "/graphql", file_obj["uri"] | ||
| ) | ||
| data = self.opencti.fetch_opencti_file( | ||
| fetch_uri=file_url, binary=True, serialize=False | ||
| ) | ||
| if data is not None: |
There was a problem hiding this comment.
This change removes the fallback that fetched file content when only uri is provided. As a result, STIX file entries with uri but no data will now be silently ignored (no file uploaded/attached). If this is intended (to prevent re-upload loops), consider at least logging a warning when uri is present but data is missing, or gating the behavior behind an explicit option so callers can still opt into fetching.
| for file_obj in ext_ref_files: | ||
| data = None | ||
| if "data" in file_obj: | ||
| data = base64.b64decode(file_obj["data"]) | ||
| elif "uri" in file_obj: | ||
| file_url = self.opencti.api_url.replace( | ||
| "/graphql", file_obj["uri"] | ||
| ) | ||
| data = self.opencti.fetch_opencti_file( | ||
| fetch_uri=file_url, binary=True, serialize=False | ||
| ) | ||
| if data is not None: |
There was a problem hiding this comment.
This behavioral change (ignoring files with uri but no data) isn’t covered by the existing unit tests for OpenCTIStix2. Please add tests that assert the intended outcome for: (1) files with data (uploaded), and (2) files with only uri (skipped / warned), so regressions don’t reintroduce the upload loop or unexpectedly drop attachments.
| for file_obj in all_files: | ||
| data = None | ||
| if "data" in file_obj: | ||
| data = base64.b64decode(file_obj["data"]) | ||
| elif "uri" in file_obj: | ||
| file_url = self.opencti.api_url.replace( | ||
| "/graphql", file_obj["uri"] | ||
| ) | ||
| data = self.opencti.fetch_opencti_file( | ||
| fetch_uri=file_url, binary=True, serialize=False | ||
| ) | ||
| if data is not None: |
There was a problem hiding this comment.
Same as above: in this branch, files that only contain uri (no data) will now be ignored with no visibility. If the intent is to stop re-uploading existing attachments, consider emitting a warning when uri exists but data is absent so users understand why attachments weren’t imported.
| for file_obj in x_opencti_files: | ||
| data = None | ||
| if "data" in file_obj: | ||
| data = base64.b64decode(file_obj["data"]) | ||
| elif "uri" in file_obj: | ||
| url = self.opencti.api_url.replace("/graphql", file_obj["uri"]) | ||
| data = self.opencti.fetch_opencti_file( | ||
| fetch_uri=url, binary=True, serialize=False | ||
| ) | ||
| if data is not None: |
There was a problem hiding this comment.
import_object() now ignores file entries that have uri but no data, which may drop attachments when importing STIX exported by OpenCTI (where file extensions commonly include uri). If this behavior is intentional, consider documenting it and/or logging when skipping these files; otherwise, reintroduce an opt-in fetch-from-URI path.
| for file_obj in x_opencti_files: | ||
| data = None | ||
| if "data" in file_obj: | ||
| data = base64.b64decode(file_obj["data"]) | ||
| elif "uri" in file_obj: | ||
| url = self.opencti.api_url.replace("/graphql", file_obj["uri"]) | ||
| data = self.opencti.fetch_opencti_file( | ||
| fetch_uri=url, binary=True, serialize=False | ||
| ) | ||
| if data is not None: |
There was a problem hiding this comment.
import_observable() has the same change: file objects with uri but no data are silently skipped. Consider warning/logging (or documenting) this behavior so callers understand why attachments weren’t imported, especially when processing bundles produced by OpenCTI that may carry only uri references.
| const currentFileUri = currentFile.uri; | ||
| const fileId = currentFileUri.replace('/storage/get/', ''); | ||
| const currentFileContent = toBase64(await getFileContent(fileId)); | ||
| copiedFiles.push({ ...currentFile, data: currentFileContent }); |
There was a problem hiding this comment.
Don't we need the file markings in addition of the file content ?
See #14585