Add security violations filter processor, and plug it in to the default OTLP logs pipeline when TcplogReceiver is configured#1576
Conversation
9ce64b2 to
482dc34
Compare
internal/config/config.go
Outdated
| } | ||
| } | ||
|
|
||
| func enforceSafeLogBatchProcessors(col *Collector) { |
There was a problem hiding this comment.
Not sure if this function should be here. Hardcoding the values of the batch processor in the code means that we can't change them in the future. Also means if a user wants to send logs to a different exporter they cant use their own log batch processor as it would be overridden by this.
There was a problem hiding this comment.
Hmm. Trying to see what we can do to ensure customer does not set huge values for batch processor when talking to N1C.
In our testing, we found that larger values lead to rpc error: code = ResourceExhausted desc = grpc: received message larger than max message returned, which is a permanent error for otel collector (export request is not retried by it).
Any suggestions to handle it safely ?
There was a problem hiding this comment.
In our documentation, we do warn users that if they are customising their OTel configuration, overriding default processors can cause issues with sending telemetry to their management plane. https://docs.nginx.com/nginx-one-console/agent/configure-otel-metrics/#example-usage
In the future, we would like to implement a mechanism where the management plane can configure the otel collector remotely so that even if a user overrides a default processor the management plane can undo it by sending their desired configuration when agents connects.
This would allow management planes to change these settings in the future without requiring updates to agent code.
There was a problem hiding this comment.
So do you suggest we remove this safety check, or is it fine as is ?
There was a problem hiding this comment.
I would suggest removing this check
There was a problem hiding this comment.
Okay removed the commit that introduced that check.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1576 +/- ##
==========================================
+ Coverage 85.02% 85.07% +0.04%
==========================================
Files 103 105 +2
Lines 13589 13719 +130
==========================================
+ Hits 11554 11671 +117
- Misses 1518 1529 +11
- Partials 517 519 +2
... and 2 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
…st message as a one-time gate — on the very first log record, it checks: - The body is a string (not int, bytes, etc.). - The body has exactly 27 pipe-separated fields matching the secops_dashboard-log profile format. If either check fails, the gate closes permanently — all subsequent messages are dropped with zero overhead (early return before any iteration) until the OTel collector is restarted. If it passes, the gate opens permanently and all future string-bodied records flow through.
…nfigured against docker0 interface IP as well (along with 127.0.0.1)
8f02abb to
a944b3e
Compare
…ne, replacing the: - logsgzipprocessor: This processor was introduced for NGINX One for a deprecated project, which is not in scope anymore. - securityviolationsprocessor: This processor was converting the sylog security violation into a JSON body log record format, and then forwarding it. This deterministed JSON log body parsing is an overhead for the agent to process for every single log record.
a944b3e to
1f88bd7
Compare
Proposed changes
COMMIT 1:
Add a new security violations filter processor that validates the first message as a one-time gate — on the very first log record, it checks:
- The body is a string (not int, bytes, etc.).
- The body has exactly 28 pipe-separated fields matching the secops-dashboard-log profile format.
COMMIT 2:
Add support for NAP V5 (containerized mode) by verifying syslog IP configured against docker0 interface IP as well (along with 127.0.0.1)
COMMIT 3:
Plug security violation filter processor into the default logs pipeline, replacing the:
- logsgzipprocessor: This processor was introduced for NGINX One for a deprecated project, which is not in scope anymore.
- securityviolationsprocessor: This processor was converting the sylog security violation into a JSON body log record format, and then forwarding it. This deterministix JSON log body conversion is an overhead for the agent to process for every single log record.
Verification
app_protect_security_logdirective hassyslog:server=127.0.0.1:1514configured)In the first log the processor encounters, if valid number of CSV separated fields are sent as per the log-profile configured, the violation is forwarded to the management plane (as per the batch settings). Any subsequent violation is also forwarded.
In the first log the processor encounters, if the expected CSV separated fields are not sent as per the log-profile configured , the violation is filtered out. Any subsequent violation is also filtered out, and not forwarded by agent.
Here is the error log that can be verified in the
/var/log/nginx-agent/opentelemetry-collector-agent.logfile for that error case:Checklist
Before creating a PR, run through this checklist and mark each as complete.
CONTRIBUTINGdocumentmake install-toolsand have attached any dependency changes to this pull requestREADME.md)