-
Notifications
You must be signed in to change notification settings - Fork 1
ROX-31430: delegate TLS to host implementation #168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I would recommend using native_tls and leaving FIPS compliance to the host operating system. |
I've changed the approach but haven't pushed out the commit yet, I'm getting a weird error when connecting to sensor with native_tls. I will update the PR once I get everything working correctly. Thanks for the suggestion @neverpanic! |
9e479d3 to
9152bb4
Compare
9152bb4 to
0d718b7
Compare
0d718b7 to
9f1b0d8
Compare
| let certs = { | ||
| let config = self.config.borrow(); | ||
| let Some(certs) = config.certs() else { | ||
| return Ok(None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part I don't follow, certs could just be None and passed further?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the let-else clause is making it so if config.certs() is None then we go through the else branch and return Ok(None), the rest of the method is skipped.
https://doc.rust-lang.org/rust-by-example/flow_control/let_else.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so the whole call chain is just short cut with None.
| }; | ||
| certs.to_owned() | ||
| }; | ||
| let (ca, cert, key) = tokio::try_join!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I see it correctly, that try_join! macro joins concurrent activity? If yes, why is it needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not needed per se, we could very easily do something like:
let ca = fs::read(certs.join("ca.pem")).await?;
let cert = fs::read(certs.join("cert.pem")).await?;
let key = fs::read(certs.join("key.pem")).await?;Do note that doing it this way, if we were to fail reading key.pem we still need to finish reading ca.pem and cert.pem first, the try_join! approach will try all of them concurrently and exit as soon as one of the tasks fails. Since this is not in a hot path though, there's not that much to be gained, I just felt like the macro approach was easier to read and a bit neater honestly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something, but my question is mostly why any kind of async is even required here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way tokio works is that it uses a thread pool to run tasks concurrently, so any operation that would block a thread could affect how the tasks are run. get_tls_connector is called from an async context (the grpc::Client::run method), so it is better to use async I/O operations here for it to not interfere with the other tasks that are running concurrently.
We could make the entire get_tls_connector method sync by using std::fs::read instead of tokio::fs::read, but the one from std will cause the calling thread to block waiting for the file to be read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the question is why we weren't doing that already, the answer would be that we were doing it wrong (my bad).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the one from std will cause the calling thread to block waiting for the file to be read.
And what's the problem with that? I guess at the bootstrap the concurrency is not yet needed, and if it would lead to a more straightforward implementation, that sounds like win for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is we have hot-realoading implemented, so the grpc client could block a thread at any point in time, not just when it starts up. We can do either, I think the async approach is more correct while being only slightly more complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, you're saying the hot-reloading implementation doesn't stop the grpc service for the time of reloading? I need to look at it again, but I was assuming when reloading the configuration, only the bpf receiving service has to be active to not loose events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a change in configuration that requires the grpc client to restart it will attempt to read the certificates again, this is done sequentially in a single task in a loop. If reading the certificates blocks the thread the task it is running from, the grpc client will work correctly but it is potentially stealing away compute time from other tasks (like the BPF worker, the configuration reloader, etc...). This is because the threads used by all tasks are shared and there is a userspace scheduler in charge of handling what tasks get assigned to what threads in a collaborative manner, if we block a thread forcefully, we will mess with this in ways that may be detrimental to the program.
In the big picture, it is probably not that big a deal because we are only reading a few small files once in a full moon, but why risk it?
fact/src/output/grpc.rs
Outdated
| Ok(Some(connector.into())) | ||
| } | ||
|
|
||
| fn get_https_connector( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a very lightweight function, can't it be a part of get_tls_connector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we can probably merge get_tls_connector and get_https_connector into a get_connector method, I'll get on it.
fact/src/output/grpc.rs
Outdated
| let channel = Channel::from_shared(url)?; | ||
| let channel = match connector { | ||
| Some(connector) => channel.connect_with_connector(connector).await?, | ||
| None => channel.connect().await?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this would be an attempt to talk over an unencrypted channel, right? If that's the case, worth adding a warning log message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we use the unencrypted channel for our tests explicitly, I can add a warning message, but do note collector does the same thing.
With this patch, we move away from using the tonic provided TLS implementation to injecting a manually built native-tls configuration, then using that to create a hyper HttpsConnector and finally telling tonic to use that connector for handling the underlying HTTPs connections needed for gRPC. In case no TLS certificates are provided, plain HTTP is used.
be6abf8 to
54c6a60
Compare
Description
With this patch, we move away from using the tonic provided TLS implementation to injecting a manually built native-tls connector, then using that to create a hyper HttpsConnector and finally telling tonic to use that connector for handling the underlying HTTPs connections needed for gRPC. In case no TLS certificates are provided, plain HTTP is used.
With native-tls TLS will be handled by the OS implementation, which in linux will default to openssl. This is important for FIPS compliance, since having a FIPS compliant openssl implementation will automatically allow us to be FIPS compliant by proxy.
Checklist
Automated testing
If any of these don't apply, please comment below.
Testing Performed
Run fact on a stackrox deployment and validated it connected to sensor correctly.