-
Notifications
You must be signed in to change notification settings - Fork 171
Add embedding engine #3280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: jerm/2026-01-13-optimizer-in-vmcp
Are you sure you want to change the base?
Add embedding engine #3280
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## jerm/2026-01-13-optimizer-in-vmcp #3280 +/- ##
=====================================================================
- Coverage 64.82% 61.91% -2.92%
=====================================================================
Files 375 401 +26
Lines 36626 39935 +3309
=====================================================================
+ Hits 23744 24725 +981
- Misses 11011 13250 +2239
- Partials 1871 1960 +89 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
111676e to
3ba5cc0
Compare
3ba5cc0 to
bca112f
Compare
therealnb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, but mostly looks good.
bca112f to
e290799
Compare
e290799 to
2165ce8
Compare
jerm-dro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it's on the right track, but I think we want an e2e test demonstrating:
- defining embedding servers with the CRD results in the creation of a deployment, service, etc
- we can call that service and get the expected results
- when we delete the CRD, the other objects are also deleted.
Also, please modify the pull request so it's against main. With an e2e test, I'd prefer to merge this right into main. It'll be one fully functional and validated change, so no need to wait on merging into main.
| if controllerutil.ContainsFinalizer(embedding, embeddingFinalizerName) { | ||
| return ctrl.Result{}, false, nil | ||
| } | ||
|
|
||
| controllerutil.AddFinalizer(embedding, embeddingFinalizerName) | ||
| err := r.Update(ctx, embedding) | ||
| if err != nil { | ||
| return ctrl.Result{}, true, err | ||
| } | ||
| return ctrl.Result{}, false, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this only returns true if there's also an err and ctrl.Result is always ctrl.Result{}. I'd recommend removing the two and updating the caller:
err := r.ensureFinalizer(...)
if err != nil {
return ctrl.Result{}
}
You can crib off existing tests here: https://github.com/stacklok/toolhive/tree/main/test/e2e/thv-operator/virtualmcp Putting these changes in the |
c88d4e5 to
4dc64c0
Compare
|
✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review. |
Large PR justification has been provided. Thank you!
2377edf to
9f438b8
Compare
8982c8d to
a465f3d
Compare
a465f3d to
47f3623
Compare
* Add header forward middleware for remote MCP servers Implement middleware that injects configured headers into requests before they are forwarded to remote MCP servers. This enables operators to configure headers server-side, removing the burden from clients. This commit adds the core middleware; subsequent commits add RunConfig types, CLI flags, runner wiring, and CRD support. Key design decisions: - RestrictedHeaders blocklist prevents misconfiguration of hop-by-hop, request smuggling, and identity spoofing headers - Authorization is allowed with a warning (valid for static tokens) - Header names are pre-canonicalized at creation time - Supports both factory pattern (thv run) and direct creation (thv proxy) - Header values are never logged, only names at DEBUG level Related: #3316 * Add Forwarded and Http2-Settings to restricted headers Add two headers to the RestrictedHeaders blocklist per review feedback: - Forwarded (RFC 7239): The standardized equivalent of X-Forwarded-* headers, which are already blocked. Omitting it left an identity spoofing gap. - Http2-Settings (RFC 7540 Section 3.2.1): A hop-by-hop header used in HTTP/1.1 to HTTP/2 upgrades. Forwarding it can cause protocol confusion and request smuggling. It is the companion to the already blocked Upgrade header.
Also add scenario for stop/restart/status endpoints for workloads.
#3425) authserver DCR hardening: Add grant_types and response_types allowlist validation Reject unsupported grant types and response types in DCR validation to prevent clients from requesting capabilities the server cannot fulfill (e.g. client_credentials, implicit token).
* refactor rbac Signed-off-by: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com> * Refactor RBAC management to eliminate code duplication Introduce a common EnsureRBACResources method in pkg/kubernetes/rbac that consolidates the pattern of creating ServiceAccount, Role, and RoleBinding resources. This eliminates ~170 lines of duplicated code across controllers. Co-authored-by: Yolanda Robla <yolanda@stacklok.com> * add license headers * increase test coverage * changes from review * fix ci * changes from review * bump chart version --------- Signed-off-by: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com> Co-authored-by: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com> Co-authored-by: taskbot <taskbot@users.noreply.github.com>
* Add token endpoint handler Implement POST /oauth/token handler that exchanges authorization codes for access tokens using fosite's access request/response flow. The handler validates the incoming token request, retrieves the stored authorization session, generates JWT access tokens, and supports RFC 8707 resource parameter for audience-restricted tokens targeting specific MCP servers. This handler completes the OAuth 2.0 authorization code flow started by the authorize and callback handlers. When a client presents an authorization code, fosite retrieves the session that was stored during the callback phase - this session contains the user's subject, the upstream token session ID (tsid), and the client ID binding. The token endpoint uses these stored claims to generate the access token, maintaining the link between issued tokens and upstream IDP tokens for later token injection by the proxy middleware. The test infrastructure is extended to properly track authorization code and PKCE sessions across the full authorize→callback→token flow. * Add RFC 8707 audience validation for token endpoint Implement proper validation of the resource parameter in the token endpoint per RFC 8707. Previously, any client-provided resource was blindly granted as the token audience, which was a security risk. Changes: - Add ErrInvalidTarget error for RFC 8707 invalid_target responses - Add ValidateAudienceURI to validate URI format (absolute, no fragment, http/https only) - Add ValidateAudienceAllowed to check resources against an allowlist - Add AllowedAudiences config field to AuthorizationServerParams and AuthorizationServerConfig - Update TokenHandler to validate before granting audience - Secure default: empty AllowedAudiences rejects all resource requests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
c2f99b1 to
ff03438
Compare
Add MCPEmbedding CRD for embedding model deployment in operator
Introduces a new MCPEmbedding custom resource to deploy HuggingFace
embedding models as MCP servers in Kubernetes. This enables semantic
search and similarity features for MCP tools and resources.
Key additions:
The embedding server uses huggingface-embedding-inference with persistent
volume caching for downloaded models and integrates with MCPGroups for
organizing backend workloads.
Large PR Justification
Multiple related changes that would break if separated