Skip to content

Conversation

@Molter73
Copy link
Collaborator

@Molter73 Molter73 commented Dec 9, 2025

Description

With this patch, we move away from using the tonic provided TLS implementation to injecting a manually built native-tls connector, then using that to create a hyper HttpsConnector and finally telling tonic to use that connector for handling the underlying HTTPs connections needed for gRPC. In case no TLS certificates are provided, plain HTTP is used.

With native-tls TLS will be handled by the OS implementation, which in linux will default to openssl. This is important for FIPS compliance, since having a FIPS compliant openssl implementation will automatically allow us to be FIPS compliant by proxy.

Checklist

  • Investigated and inspected CI test results
  • Updated documentation accordingly

Automated testing

  • Added unit tests
  • Added integration tests
  • Added regression tests

If any of these don't apply, please comment below.

Testing Performed

Run fact on a stackrox deployment and validated it connected to sensor correctly.

@neverpanic
Copy link

I would recommend using native_tls and leaving FIPS compliance to the host operating system.

@Molter73 Molter73 marked this pull request as draft December 16, 2025 08:57
@Molter73
Copy link
Collaborator Author

I would recommend using native_tls and leaving FIPS compliance to the host operating system.

I've changed the approach but haven't pushed out the commit yet, I'm getting a weird error when connecting to sensor with native_tls. I will update the PR once I get everything working correctly. Thanks for the suggestion @neverpanic!

@Molter73 Molter73 force-pushed the mauro/ROX-31430/fips-compliance branch 5 times, most recently from 9e479d3 to 9152bb4 Compare December 17, 2025 14:20
@Molter73 Molter73 changed the title ROX-31430: use FIPS mode for gRPC communication ROX-31430: delegate TLS to host implementation Dec 17, 2025
@Molter73 Molter73 force-pushed the mauro/ROX-31430/fips-compliance branch from 9152bb4 to 0d718b7 Compare December 17, 2025 15:06
@Molter73 Molter73 marked this pull request as ready for review December 17, 2025 15:47
@Molter73 Molter73 force-pushed the mauro/ROX-31430/fips-compliance branch from 0d718b7 to 9f1b0d8 Compare December 18, 2025 11:05
let certs = {
let config = self.config.borrow();
let Some(certs) = config.certs() else {
return Ok(None);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part I don't follow, certs could just be None and passed further?

Copy link
Collaborator Author

@Molter73 Molter73 Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the let-else clause is making it so if config.certs() is None then we go through the else branch and return Ok(None), the rest of the method is skipped.

https://doc.rust-lang.org/rust-by-example/flow_control/let_else.html

Copy link
Contributor

@erthalion erthalion Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so the whole call chain is just short cut with None.

};
certs.to_owned()
};
let (ca, cert, key) = tokio::try_join!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I see it correctly, that try_join! macro joins concurrent activity? If yes, why is it needed here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not needed per se, we could very easily do something like:

let ca = fs::read(certs.join("ca.pem")).await?;
let cert = fs::read(certs.join("cert.pem")).await?;
let key = fs::read(certs.join("key.pem")).await?;

Do note that doing it this way, if we were to fail reading key.pem we still need to finish reading ca.pem and cert.pem first, the try_join! approach will try all of them concurrently and exit as soon as one of the tasks fails. Since this is not in a hot path though, there's not that much to be gained, I just felt like the macro approach was easier to read and a bit neater honestly.

https://docs.rs/tokio/latest/tokio/macro.try_join.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, but my question is mostly why any kind of async is even required here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way tokio works is that it uses a thread pool to run tasks concurrently, so any operation that would block a thread could affect how the tasks are run. get_tls_connector is called from an async context (the grpc::Client::run method), so it is better to use async I/O operations here for it to not interfere with the other tasks that are running concurrently.

We could make the entire get_tls_connector method sync by using std::fs::read instead of tokio::fs::read, but the one from std will cause the calling thread to block waiting for the file to be read.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the question is why we weren't doing that already, the answer would be that we were doing it wrong (my bad).

Copy link
Contributor

@erthalion erthalion Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the one from std will cause the calling thread to block waiting for the file to be read.

And what's the problem with that? I guess at the bootstrap the concurrency is not yet needed, and if it would lead to a more straightforward implementation, that sounds like win for me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is we have hot-realoading implemented, so the grpc client could block a thread at any point in time, not just when it starts up. We can do either, I think the async approach is more correct while being only slightly more complicated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, you're saying the hot-reloading implementation doesn't stop the grpc service for the time of reloading? I need to look at it again, but I was assuming when reloading the configuration, only the bpf receiving service has to be active to not loose events.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a change in configuration that requires the grpc client to restart it will attempt to read the certificates again, this is done sequentially in a single task in a loop. If reading the certificates blocks the thread the task it is running from, the grpc client will work correctly but it is potentially stealing away compute time from other tasks (like the BPF worker, the configuration reloader, etc...). This is because the threads used by all tasks are shared and there is a userspace scheduler in charge of handling what tasks get assigned to what threads in a collaborative manner, if we block a thread forcefully, we will mess with this in ways that may be detrimental to the program.

In the big picture, it is probably not that big a deal because we are only reading a few small files once in a full moon, but why risk it?

Ok(Some(connector.into()))
}

fn get_https_connector(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a very lightweight function, can't it be a part of get_tls_connector?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can probably merge get_tls_connector and get_https_connector into a get_connector method, I'll get on it.

let channel = Channel::from_shared(url)?;
let channel = match connector {
Some(connector) => channel.connect_with_connector(connector).await?,
None => channel.connect().await?,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this would be an attempt to talk over an unencrypted channel, right? If that's the case, worth adding a warning log message.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we use the unencrypted channel for our tests explicitly, I can add a warning message, but do note collector does the same thing.

With this patch, we move away from using the tonic provided TLS
implementation to injecting a manually built native-tls configuration,
then using that to create a hyper HttpsConnector and finally telling
tonic to use that connector for handling the underlying HTTPs
connections needed for gRPC. In case no TLS certificates are provided,
plain HTTP is used.
@Molter73 Molter73 force-pushed the mauro/ROX-31430/fips-compliance branch from be6abf8 to 54c6a60 Compare January 7, 2026 16:52
@Molter73 Molter73 requested a review from erthalion January 8, 2026 09:05
@Molter73 Molter73 merged commit 0079b97 into main Jan 9, 2026
23 checks passed
@Molter73 Molter73 deleted the mauro/ROX-31430/fips-compliance branch January 9, 2026 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants