Skip to content

docs: Add MLflow integration guide#2344

Merged
sfc-gh-jreini merged 2 commits into
truera:mainfrom
debu-sinha:docs/mlflow-integration
Feb 2, 2026
Merged

docs: Add MLflow integration guide#2344
sfc-gh-jreini merged 2 commits into
truera:mainfrom
debu-sinha:docs/mlflow-integration

Conversation

@debu-sinha

@debu-sinha debu-sinha commented Feb 1, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds documentation for using TruLens feedback functions as MLflow GenAI scorers.

Resolves #2343

Changes

  • New folder: docs/component_guides/integrations/ - For third-party integrations
  • New file: docs/component_guides/integrations/index.md - Integrations index
  • New file: docs/component_guides/integrations/mlflow.md - MLflow integration guide

Documentation includes:

  • Installation instructions (MLflow >= 3.10.0)
  • Available scorers table:
    • RAG scorers: Groundedness, ContextRelevance, AnswerRelevance, Coherence
    • Agent trace scorers: LogicalConsistency, ExecutionEfficiency, PlanAdherence, PlanQuality, ToolSelection, ToolCalling
  • Usage examples (direct calls and batch evaluation with mlflow.genai.evaluate)
  • Model configuration for multiple providers (OpenAI, Anthropic, Azure, Bedrock, Vertex AI)
  • Threshold configuration
  • MLflow tracing integration
  • Best practices and troubleshooting

Context

The TruLens integration was merged into MLflow in PR #19492 and ships in MLflow 3.10.0. This documentation helps TruLens users discover and use this integration.

Navigation

This creates a new "Integrations" section under Component Guides. The navigation may need to be updated in the site config if not auto-discovered.


Important

Adds documentation for integrating TruLens feedback functions with MLflow as GenAI scorers, including installation, usage, and troubleshooting.

  • Documentation:
    • Adds mlflow.md in docs/component_guides/integrations/ for MLflow integration guide.
    • Includes installation instructions for MLflow >= 3.10.0 and TruLens.
    • Details available scorers: RAG and Agent Trace scorers.
    • Provides usage examples for direct calls and batch evaluation.
    • Covers model configuration for OpenAI, Anthropic, Azure, Bedrock, and Vertex AI.
    • Describes threshold configuration and dynamic scorer creation.
    • Explains MLflow tracing integration and viewing results.
    • Lists best practices and troubleshooting tips.
  • Structure:
    • Creates docs/component_guides/integrations/ for third-party integrations.
    • Adds index.md to list available integrations.

This description was created by Ellipsis for 47dc3ea. You can customize this summary. It will automatically update as commits are pushed.

Add documentation for using TruLens feedback functions as MLflow GenAI scorers.

Includes:
- Installation instructions
- Available scorers (RAG and Agent trace)
- Usage examples (direct calls and batch evaluation)
- Model configuration for multiple providers
- Threshold configuration
- MLflow tracing integration
- Best practices and troubleshooting

Resolves truera#2343

Signed-off-by: debu-sinha <debusinha2009@gmail.com>
@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 1, 2026
@dosubot

dosubot Bot commented Feb 1, 2026

Copy link
Copy Markdown

Related Documentation

No published documentation to review for changes on this repository.

Write your first living document

How did I do? Any feedback?  Join Discord

@dosubot dosubot Bot added the documentation Improvements or additions to documentation label Feb 1, 2026
@debu-sinha

Copy link
Copy Markdown
Contributor Author

Note: This is a docs-only PR adding MLflow integration documentation. The sf-e2e check failure appears to be an internal Snowflake infrastructure test unrelated to documentation changes.

@@ -0,0 +1,238 @@
# MLflow Integration

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating a new /integrations/ folder, place this in docs/component_guides/evaluations.

@sfc-gh-jreini sfc-gh-jreini left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks again for contributing - few small things to address than can approve.

Install MLflow with TruLens support:

```bash
pip install 'mlflow>=3.10.0' trulens

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will also need trulens-providers-litellm I believe

@@ -0,0 +1,238 @@
# MLflow Integration

TruLens feedback functions are available as first-class scorers in MLflow's GenAI evaluation framework starting with MLflow 3.10.0. This integration was contributed by [Debu Sinha](https://github.com/debu-sinha) in [MLflow PR #19492](https://github.com/mlflow/mlflow/pull/19492).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioning the PR here seems nonstandard. Can we call this out in the contribution guide instead (saying that integrating TruLens to other libraries is a new category of contributions).

| `Groundedness` | Evaluates whether the response is grounded in the provided context |
| `ContextRelevance` | Evaluates whether the retrieved context is relevant to the query |
| `AnswerRelevance` | Evaluates whether the response is relevant to the input query |
| `Coherence` | Evaluates the coherence and logical flow of the response |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coherence should be in a separate category/not limited to RAG. You could call it an Output Scorer


## Dynamic Scorer Creation

Use `get_scorer` to create scorers dynamically:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a sentence or two on why you would want to create the scorers dynamically

Changes:
- Move docs from integrations/ to evaluation/ folder per reviewer request
- Add trulens-providers-litellm to installation instructions
- Remove PR reference from intro (nonstandard)
- Recategorize Coherence as "Output Scorer" (not RAG-specific)
- Add explanation for dynamic scorer creation use case
- Update related resources link

Signed-off-by: debu-sinha <debusinha2009@gmail.com>
@debu-sinha

debu-sinha commented Feb 2, 2026

Copy link
Copy Markdown
Contributor Author

@sfc-gh-jreini All review feedback addressed:

  1. Moved file from integrations/ to evaluation/ folder
  2. Added trulens-providers-litellm to installation instructions
  3. Removed PR reference from intro
  4. Recategorized Coherence as "Output Scorer" (separate from RAG scorers)
  5. Added explanation for why you'd use dynamic scorer creation (get_scorer)

Ready for re-review!

@sfc-gh-jreini sfc-gh-jreini left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Would love to share broadly about this integration. @debu-sinha Interested in co-authoring a blog about this?

@sfc-gh-jreini sfc-gh-jreini enabled auto-merge (squash) February 2, 2026 16:19
@debu-sinha

Copy link
Copy Markdown
Contributor Author

Thanks for the review and glad the docs look good!

Absolutely interested in co-authoring a blog. The TruLens + MLflow integration opens up some interesting possibilities - especially the agent trace scorers covering the TRAIL evaluation framework.

Happy to contribute wherever helpful. Do you have a preferred format or platform in mind? I can draft an outline covering the key use cases (RAG evaluation, agent traces, etc.) if that would be a good starting point.

Let me know how you'd like to proceed.

@sfc-gh-jreini

sfc-gh-jreini commented Feb 2, 2026

Copy link
Copy Markdown
Contributor

Thanks for the review and glad the docs look good!

Absolutely interested in co-authoring a blog. The TruLens + MLflow integration opens up some interesting possibilities - especially the agent trace scorers covering the TRAIL evaluation framework.

Happy to contribute wherever helpful. Do you have a preferred format or platform in mind? I can draft an outline covering the key use cases (RAG evaluation, agent traces, etc.) if that would be a good starting point.

Let me know how you'd like to proceed.

I've got a draft started, can I add the gmail listed on your github? Or would you prefer a different email

@sfc-gh-jreini sfc-gh-jreini merged commit e2c5d8c into truera:main Feb 2, 2026
2 of 3 checks passed
@debu-sinha

Copy link
Copy Markdown
Contributor Author

The gmail on my GitHub works - looking forward to seeing the draft!

@sfc-gh-jreini sfc-gh-jreini mentioned this pull request Feb 2, 2026
@debu-sinha

debu-sinha commented Feb 10, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the blog draft and review. I finished my edits on the blog last week -- let me know if everything looks good or if anything needs adjusting.

@sfc-gh-jreini

Copy link
Copy Markdown
Contributor

Thanks Debu, appreciate your contribution to the blog. Will reach out if anything is needed, otherwise expecting to publish this aligned with the mlflow release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Documentation: Add MLflow Integration Guide

2 participants