AIP-99: Add LLMFileAnalysisOperator and @task.llm_file_analysis#64077
Merged
gopidesupavan merged 7 commits intoapache:mainfrom Mar 26, 2026
Merged
AIP-99: Add LLMFileAnalysisOperator and @task.llm_file_analysis#64077gopidesupavan merged 7 commits intoapache:mainfrom
gopidesupavan merged 7 commits intoapache:mainfrom
Conversation
Member
Author
|
@codex review |
180b698 to
ddde209
Compare
74a77e9 to
fce35be
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new “read-only file analysis” capability to the common-ai provider by introducing LLMFileAnalysisOperator and the @task.llm_file_analysis TaskFlow decorator. The feature resolves local/object-storage paths via ObjectStoragePath, normalizes supported text formats into prompt context, and supports multimodal attachments (PNG/JPG/PDF) plus optional Avro/Parquet readers.
Changes:
- Introduces
LLMFileAnalysisOperatorand@task.llm_file_analysis, including HITL approval support and structured output (output_type). - Adds file discovery + format rendering utilities with limits (max files/bytes/text) and optional Avro/Parquet support.
- Adds docs, example DAGs, dependency extras, and unit tests covering core behaviors and dependency-gated paths.
Reviewed changes
Copilot reviewed 15 out of 17 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
uv.lock |
Adds avro/parquet extras and locked deps for fastavro/pyarrow. |
providers/common/ai/pyproject.toml |
Declares avro and parquet optional extras with version markers. |
providers/common/ai/provider.yaml |
Registers new operator docs/module and the new task decorator. |
providers/common/ai/src/airflow/providers/common/ai/get_provider_info.py |
Exposes the new operator and decorator via provider metadata. |
providers/common/ai/src/airflow/providers/common/ai/exceptions.py |
Adds file-analysis-specific exception types. |
providers/common/ai/src/airflow/providers/common/ai/utils/file_analysis.py |
Implements file resolution, format detection, sampling/truncation, and prompt construction (incl. multimodal attachments). |
providers/common/ai/src/airflow/providers/common/ai/operators/llm_file_analysis.py |
New operator that builds file-analysis request content and performs a single LLM call (with optional approval + structured output). |
providers/common/ai/src/airflow/providers/common/ai/decorators/llm_file_analysis.py |
Adds TaskFlow decorator wrapper around the new operator. |
providers/common/ai/src/airflow/providers/common/ai/example_dags/example_llm_file_analysis.py |
Example DAGs for basic, prefix, multimodal, structured, and decorator usage. |
providers/common/ai/docs/operators/llm_file_analysis.rst |
New user-facing docs page for the operator and decorator (usage + parameters + formats). |
providers/common/ai/docs/operators/index.rst |
Updates operator overview table and adds a short description of the new operator. |
docs/spelling_wordlist.txt |
Adds terms used in the new docs (“codec”, “PDFs”). |
providers/common/ai/tests/unit/common/ai/utils/test_file_analysis.py |
Unit tests for file discovery, limits, gzip handling, multimodal behavior, and Avro/Parquet gating. |
providers/common/ai/tests/unit/common/ai/operators/test_llm_file_analysis.py |
Unit tests for operator execution, structured output serialization, and approval deferral/complete flows. |
providers/common/ai/tests/unit/common/ai/decorators/test_llm_file_analysis.py |
Unit tests for decorator execution and prompt validation/templating behavior. |
providers/common/ai/tests/unit/common/ai/assets/__init__.py |
Marks test assets directory as a package. |
providers/common/ai/src/airflow/providers/common/ai/utils/file_analysis.py
Outdated
Show resolved
Hide resolved
providers/common/ai/src/airflow/providers/common/ai/utils/file_analysis.py
Outdated
Show resolved
Hide resolved
7446f13 to
06b92ea
Compare
…-ai provider # Conflicts: # uv.lock
06b92ea to
3f642ef
Compare
Member
Author
Member
Author
kaxil
approved these changes
Mar 25, 2026
providers/common/ai/src/airflow/providers/common/ai/example_dags/example_llm_file_analysis.py
Outdated
Show resolved
Hide resolved
providers/common/ai/src/airflow/providers/common/ai/utils/file_analysis.py
Outdated
Show resolved
Hide resolved
providers/common/ai/src/airflow/providers/common/ai/utils/file_analysis.py
Show resolved
Hide resolved
providers/common/ai/src/airflow/providers/common/ai/utils/file_analysis.py
Outdated
Show resolved
Hide resolved
nailo2c
pushed a commit
to nailo2c/airflow
that referenced
this pull request
Mar 30, 2026
…he#64077) * Add LLMFileAnalysisOperator and @task.llm_file_analysis to the common-ai provider # Conflicts: # uv.lock * Fix mypy issues * Update utils * Update return model * Fix spells * fix up read * document prefix lookup operation
Suraj-kumar00
pushed a commit
to Suraj-kumar00/airflow
that referenced
this pull request
Apr 7, 2026
…he#64077) * Add LLMFileAnalysisOperator and @task.llm_file_analysis to the common-ai provider # Conflicts: # uv.lock * Fix mypy issues * Update utils * Update return model * Fix spells * fix up read * document prefix lookup operation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Add
LLMFileAnalysisOperatorand@task.llm_file_analysisto the common-ai provider.This adds a read-only file analysis operator that resolves files through
ObjectStoragePath, normalizes supported text formats into prompt context, and optionally attaches PNG/JPG/PDF inputs for multimodal models. It supports single files and directory/prefix inputs, enforces file and prompt limits before calling the model, and includes structured-output support throughoutput_type.The change also adds helper utilities, docs, example DAGs, optional Avro/Parquet extras, and unit tests for text, structured, multimodal, limit, and dependency-gated paths.
closes: #ISSUE
Was generative AI tooling used to co-author this PR?
Generated-by: Codex (GPT-5) following the guidelines
{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.