Skip to content

[backend/pycti] Prevent file upload looping (#14585)#14749

Open
richard-julien wants to merge 1 commit intomasterfrom
issue/14585_loop
Open

[backend/pycti] Prevent file upload looping (#14585)#14749
richard-julien wants to merge 1 commit intomasterfrom
issue/14585_loop

Conversation

@richard-julien
Copy link
Member

See #14585

@richard-julien richard-julien linked an issue Mar 3, 2026 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Mar 3, 2026

Codecov Report

❌ Patch coverage is 23.52941% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 32.37%. Comparing base (f7e084c) to head (236d0da).

Files with missing lines Patch % Lines
...raphql/src/modules/playbook/playbook-components.ts 23.52% 13 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master   #14749   +/-   ##
=======================================
  Coverage   32.37%   32.37%           
=======================================
  Files        3098     3098           
  Lines      211027   211029    +2     
  Branches    38244    38245    +1     
=======================================
+ Hits        68312    68317    +5     
+ Misses     142715   142712    -3     
Flag Coverage Δ
opencti-client-python 45.53% <ø> (+0.04%) ⬆️
opencti-front 2.83% <ø> (ø)
opencti-graphql 67.74% <23.52%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@richard-julien richard-julien added the filigran team use to identify PR from the Filigran team label Mar 4, 2026
@richard-julien richard-julien requested a review from Copilot March 4, 2026 08:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses issue #14585 where playbook-driven “knowledge manipulation” updates on containers inadvertently delete/re-add existing attached files, causing noisy history entries and unnecessary file operations.

Changes:

  • Backend: when the Container Wrapper component is configured to copy files, it now attempts to embed file content (data) into the STIX file extensions instead of copying only uri references.
  • PyCTI: removes the “fetch file by uri” fallback during STIX import, so files are only uploaded when data is explicitly embedded.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
opencti-platform/opencti-graphql/src/modules/playbook/playbook-components.ts Updates container wrapping to resolve and embed file contents when copyFiles is enabled.
client-python/pycti/utils/opencti_stix2.py Prevents implicit file re-download/re-upload by requiring embedded data for file import.

Comment on lines +638 to +639
const currentFileContent = toBase64(await getFileContent(fileId));
copiedFiles.push({ ...currentFile, data: currentFileContent });
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getFileContent(fileId) defaults to UTF-8 decoding, then toBase64() re-encodes the resulting string. For binary files (e.g., PDFs) this will corrupt the content. Use getFileContent(fileId, 'base64') (as done in database/middleware.ts:631-634) and assign that directly to data (handling undefined).

Suggested change
const currentFileContent = toBase64(await getFileContent(fileId));
copiedFiles.push({ ...currentFile, data: currentFileContent });
const currentFileContent = await getFileContent(fileId, 'base64');
if (currentFileContent) {
copiedFiles.push({ ...currentFile, data: currentFileContent });
} else {
logApp.error('Cant copy file from main element to the container: empty content', { name: currentFile.name });
}

Copilot uses AI. Check for mistakes.
Comment on lines +629 to +644
const stixFileExtensions = baseData.extensions[STIX_EXT_OCTI].files;
if (copyFiles && stixFileExtensions && stixFileExtensions.length > 0) {
// We need to get the files and add the data inside
const copiedFiles = [];
for (let index = 0; index < stixFileExtensions.length; index += 1) {
const currentFile = stixFileExtensions[index];
try {
const currentFileUri = currentFile.uri;
const fileId = currentFileUri.replace('/storage/get/', '');
const currentFileContent = toBase64(await getFileContent(fileId));
copiedFiles.push({ ...currentFile, data: currentFileContent });
} catch (e) {
logApp.error('Cant copy file from main element to the container', { cause: e, name: currentFile.name });
}
}
container.extensions[STIX_EXT_OCTI].files = copiedFiles;
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current loop drops a file entirely if it already has data (since its uri can be 'unknown') or if getFileContent fails, because only successfully fetched files are pushed to copiedFiles. To avoid unintended file removals, preserve existing currentFile.data when present and/or push the original currentFile into copiedFiles on failure (optionally marking it no_trigger_import).

Copilot uses AI. Check for mistakes.
Comment on lines 652 to 656
for file_obj in ext_ref_files:
data = None
if "data" in file_obj:
data = base64.b64decode(file_obj["data"])
elif "uri" in file_obj:
file_url = self.opencti.api_url.replace(
"/graphql", file_obj["uri"]
)
data = self.opencti.fetch_opencti_file(
fetch_uri=file_url, binary=True, serialize=False
)
if data is not None:
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change removes the fallback that fetched file content when only uri is provided. As a result, STIX file entries with uri but no data will now be silently ignored (no file uploaded/attached). If this is intended (to prevent re-upload loops), consider at least logging a warning when uri is present but data is missing, or gating the behavior behind an explicit option so callers can still opt into fetching.

Copilot uses AI. Check for mistakes.
Comment on lines 652 to 656
for file_obj in ext_ref_files:
data = None
if "data" in file_obj:
data = base64.b64decode(file_obj["data"])
elif "uri" in file_obj:
file_url = self.opencti.api_url.replace(
"/graphql", file_obj["uri"]
)
data = self.opencti.fetch_opencti_file(
fetch_uri=file_url, binary=True, serialize=False
)
if data is not None:
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavioral change (ignoring files with uri but no data) isn’t covered by the existing unit tests for OpenCTIStix2. Please add tests that assert the intended outcome for: (1) files with data (uploaded), and (2) files with only uri (skipped / warned), so regressions don’t reintroduce the upload loop or unexpectedly drop attachments.

Copilot generated this review using guidance from repository custom instructions.
Comment on lines 846 to 850
for file_obj in all_files:
data = None
if "data" in file_obj:
data = base64.b64decode(file_obj["data"])
elif "uri" in file_obj:
file_url = self.opencti.api_url.replace(
"/graphql", file_obj["uri"]
)
data = self.opencti.fetch_opencti_file(
fetch_uri=file_url, binary=True, serialize=False
)
if data is not None:
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: in this branch, files that only contain uri (no data) will now be ignored with no visibility. If the intent is to stop re-uploading existing attachments, consider emitting a warning when uri exists but data is absent so users understand why attachments weren’t imported.

Copilot uses AI. Check for mistakes.
Comment on lines 1137 to 1141
for file_obj in x_opencti_files:
data = None
if "data" in file_obj:
data = base64.b64decode(file_obj["data"])
elif "uri" in file_obj:
url = self.opencti.api_url.replace("/graphql", file_obj["uri"])
data = self.opencti.fetch_opencti_file(
fetch_uri=url, binary=True, serialize=False
)
if data is not None:
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import_object() now ignores file entries that have uri but no data, which may drop attachments when importing STIX exported by OpenCTI (where file extensions commonly include uri). If this behavior is intentional, consider documenting it and/or logging when skipping these files; otherwise, reintroduce an opt-in fetch-from-URI path.

Copilot uses AI. Check for mistakes.
Comment on lines 1260 to 1264
for file_obj in x_opencti_files:
data = None
if "data" in file_obj:
data = base64.b64decode(file_obj["data"])
elif "uri" in file_obj:
url = self.opencti.api_url.replace("/graphql", file_obj["uri"])
data = self.opencti.fetch_opencti_file(
fetch_uri=url, binary=True, serialize=False
)
if data is not None:
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import_observable() has the same change: file objects with uri but no data are silently skipped. Consider warning/logging (or documenting) this behavior so callers understand why attachments weren’t imported, especially when processing bundles produced by OpenCTI that may carry only uri references.

Copilot uses AI. Check for mistakes.
const currentFileUri = currentFile.uri;
const fileId = currentFileUri.replace('/storage/get/', '');
const currentFileContent = toBase64(await getFileContent(fileId));
copiedFiles.push({ ...currentFile, data: currentFileContent });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need the file markings in addition of the file content ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

filigran team use to identify PR from the Filigran team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Playbook: manipulating container files even if not needed

3 participants