Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/instructions/code-review.instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Use these rules when reviewing pull requests to the Apache Airflow repository.

- **Scheduler must never run user code.** It only processes serialized Dags. Flag any scheduler-path code that deserializes or executes Dag/task code.
- **Flag any task execution code that accesses the metadata DB directly** instead of through the Execution API (`/execution` endpoints).
- **Flag any code in Dag Processor or Triggerer that breaks process isolation** — these components run user code in isolated processes.
- **Flag any code in Dag Processor or Triggerer that breaks process isolation** — these components run user code in separate processes from the Scheduler and API Server, but note that they potentially have direct metadata database access and potentially bypass JWT authentication via in-process Execution API transport. This is an intentional design choice documented in the security model, not a security vulnerability.
- **Flag any provider importing core internals** like `SUPERVISOR_COMMS` or task-runner plumbing. Providers interact through the public SDK and execution API only.

## Database and Query Correctness
Expand Down
29 changes: 26 additions & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,15 +67,38 @@ UV workspace monorepo. Key paths:
## Architecture Boundaries

1. Users author Dags with the Task SDK (`airflow.sdk`).
2. Dag Processor parses Dag files in isolated processes and stores serialized Dags in the metadata DB.
2. Dag File Processor parses Dag files in separate processes and stores serialized Dags in the metadata DB. Software guards prevent individual parsing processes from accessing the database directly and enforce use of the Execution API, but these guards do not protect against intentional bypassing by malicious or misconfigured code.
3. Scheduler reads serialized Dags — **never runs user code** — and creates Dag runs / task instances.
4. Workers execute tasks via Task SDK and communicate with the API server through the Execution API — **never access the metadata DB directly**.
4. Workers execute tasks via Task SDK and communicate with the API server through the Execution API — **never access the metadata DB directly**. Each task receives a short-lived JWT token scoped to its task instance ID.
5. API Server serves the React UI and handles all client-database interactions.
6. Triggerer evaluates deferred tasks/sensors in isolated processes.
6. Triggerer evaluates deferred tasks/sensors in separate processes. Like the Dag File Processor, software guards steer it through the Execution API rather than direct database access, but these guards do not protect against intentional bypassing by malicious or misconfigured code.
7. Shared libraries that are symbolically linked to different Python distributions are in `shared` folder.
8. Airflow uses `uv workspace` feature to keep all the distributions sharing dependencies and venv
9. Each of the distributions should declare other needed distributions: `uv --project <FOLDER> sync` command acts on the selected project in the monorepo with only dependencies that it has

## Security Model

When reviewing code, writing security documentation, or performing security research, keep in
mind the following aspects of Airflow's security model. The authoritative reference is
[`airflow-core/docs/security/security_model.rst`](airflow-core/docs/security/security_model.rst)
and [`airflow-core/docs/security/jwt_token_authentication.rst`](airflow-core/docs/security/jwt_token_authentication.rst).

Particularly, the intentional design choices, that are not security vulnerabilities and should not be
reported as such are described in "What is NOT considered a security vulnerability" chapter of the security model.

**When flagging security concerns, distinguish between:**

1. **Actual vulnerabilities** — code that violates the documented security model (e.g., a worker
gaining database access it shouldn't have, a Scheduler executing user code, an unauthenticated
user accessing protected endpoints).
2. **Known limitations** — documented gaps where the current implementation doesn't provide full
isolation (e.g., DFP/Triggerer database access, shared Execution API resources, multi-team
not enforcing task-level isolation). These are tracked for improvement in future versions and
should not be reported as new findings.
3. **Deployment hardening opportunities** — measures a Deployment Manager can take to improve
isolation beyond what Airflow enforces natively (e.g., per-component configuration, asymmetric
JWT keys, network policies). These belong in deployment guidance, not as code-level issues.

# Shared libraries

- shared libraries provide implementation of some common utilities like logging, configuration where the code should be reused in different distributions (potentially in different versions)
Expand Down
10 changes: 10 additions & 0 deletions airflow-core/.pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -263,6 +263,16 @@ repos:
require_serial: true
pass_filenames: false
files: ^src/airflow/config_templates/config\.yml$
- id: check-security-doc-constants
name: Check security docs match config.yml constants
entry: ../scripts/ci/prek/check_security_doc_constants.py
language: python
pass_filenames: false
files: >
(?x)
^src/airflow/config_templates/config\.yml$|
^docs/security/jwt_token_authentication\.rst$|
^docs/security/security_model\.rst$
- id: check-airflow-version-checks-in-core
language: pygrep
name: No AIRFLOW_V_* imports in airflow-core
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,12 @@ the :doc:`Celery executor <apache-airflow-providers-celery:celery_executor>`.


Once you have configured the executor, it is necessary to make sure that every node in the cluster contains
the same configuration and Dags. Airflow sends simple instructions such as "execute task X of Dag Y", but
does not send any Dag files or configuration. You can use a simple cronjob or any other mechanism to sync
Dags and configs across your nodes, e.g., checkout Dags from git repo every 5 minutes on all nodes.
the Dags and configuration appropriate for its role. Airflow sends simple instructions such as
"execute task X of Dag Y", but does not send any Dag files or configuration. For synchronization of Dags
we recommend the Dag Bundle mechanism (including ``GitDagBundle``), which allows you to make use of
DAG versioning. For security-sensitive deployments, restrict sensitive configuration (JWT signing keys,
database credentials, Fernet keys) to only the components that need them rather than sharing all
configuration across all nodes — see :doc:`/security/security_model` for guidance.


Logging
Expand Down
6 changes: 4 additions & 2 deletions airflow-core/docs/best-practices.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1098,8 +1098,10 @@ The benefits of using those operators are:
environment is optimized for the case where you have multiple similar, but different environments.
* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
be added dynamically. This is good for both, security and stability.
* Complete isolation between tasks. They cannot influence one another in other ways than using standard
Airflow XCom mechanisms.
* Strong process-level isolation between tasks. Tasks run in separate containers/pods and cannot
influence one another at the process or filesystem level. They can still interact through standard
Airflow mechanisms (XComs, connections, variables) via the Execution API. See
:doc:`/security/security_model` for the full isolation model.

The drawbacks:

Expand Down
25 changes: 16 additions & 9 deletions airflow-core/docs/configurations-ref.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,22 @@ Configuration Reference
This page contains the list of all the available Airflow configurations that you
can set in ``airflow.cfg`` file or using environment variables.

Use the same configuration across all the Airflow components. While each component
does not require all, some configurations need to be same otherwise they would not
work as expected. A good example for that is :ref:`secret_key<config:api__secret_key>` which
should be same on the Webserver and Worker to allow Webserver to fetch logs from Worker.

The webserver key is also used to authorize requests to Celery workers when logs are retrieved. The token
generated using the secret key has a short expiry time though - make sure that time on ALL the machines
that you run Airflow components on is synchronized (for example using ntpd) otherwise you might get
"forbidden" errors when the logs are accessed.
Different Airflow components may require different configuration parameters, and for
improved security, you should restrict sensitive configuration to only the components that
need it. Some configuration values must be shared across specific components to work
correctly — for example, the JWT signing key (``[api_auth] jwt_secret`` or
``[api_auth] jwt_private_key_path``) must be consistent across all components that generate
or validate JWT tokens (Scheduler, API Server). However, other sensitive parameters such as
database connection strings or Fernet keys should only be provided to components that need them.

For security-sensitive deployments, pass configuration values via environment variables
scoped to individual components rather than sharing a single configuration file across all
components. See :doc:`/security/security_model` for details on which configuration
parameters should be restricted to which components.

Make sure that time on ALL the machines that you run Airflow components on is synchronized
(for example using ntpd) otherwise you might get "forbidden" errors when the logs are
accessed or API calls are made.

.. note::
For more information see :doc:`/howto/set-config`.
Expand Down
2 changes: 1 addition & 1 deletion airflow-core/docs/core-concepts/multi-team.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Multi-Team mode is designed for medium to large organizations that typically hav
**Use Multi-Team mode when:**

- You have many teams that need to share Airflow infrastructure
- You need resource isolation (Variables, Connections, Secrets, etc) between teams
- You need resource isolation (Variables, Connections, Secrets, etc) between teams at the UI and API level (see :doc:`/security/security_model` for task-level isolation limitations)
- You want separate execution environments per team
- You want separate views per team in the Airflow UI
- You want to minimize operational overhead or cost by sharing a single Airflow deployment
Expand Down
23 changes: 14 additions & 9 deletions airflow-core/docs/howto/set-config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -157,15 +157,20 @@ the example below.
See :doc:`/administration-and-deployment/modules_management` for details on how Python and Airflow manage modules.

.. note::
Use the same configuration across all the Airflow components. While each component
does not require all, some configurations need to be same otherwise they would not
work as expected. A good example for that is :ref:`secret_key<config:api__secret_key>` which
should be same on the Webserver and Worker to allow Webserver to fetch logs from Worker.

The webserver key is also used to authorize requests to Celery workers when logs are retrieved. The token
generated using the secret key has a short expiry time though - make sure that time on ALL the machines
that you run Airflow components on is synchronized (for example using ntpd) otherwise you might get
"forbidden" errors when the logs are accessed.
Different Airflow components may require different configuration parameters. For improved
security, restrict sensitive configuration to only the components that need it rather than
sharing all configuration across all components. Some values must be consistent across specific
components — for example, the JWT signing key must match between components that generate and
validate tokens. However, sensitive parameters such as database connection strings, Fernet keys,
and secrets backend credentials should only be provided to components that actually need them.

For security-sensitive deployments, pass configuration values via environment variables scoped
to individual components. See :doc:`/security/security_model` for detailed guidance on
restricting configuration parameters.

Make sure that time on ALL the machines that you run Airflow components on is synchronized
(for example using ntpd) otherwise you might get "forbidden" errors when the logs are
accessed or API calls are made.

.. _set-config:configuring-local-settings:

Expand Down
2 changes: 1 addition & 1 deletion airflow-core/docs/installation/upgrading_to_airflow3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ In Airflow 3, direct metadata database access from task code is now restricted.

- **No Direct Database Access**: Task code can no longer directly import and use Airflow database sessions or models.
- **API-Based Resource Access**: All runtime interactions (state transitions, heartbeats, XComs, and resource fetching) are handled through a dedicated Task Execution API.
- **Enhanced Security**: This ensures isolation and security by preventing malicious task code from accessing or modifying the Airflow metadata database.
- **Enhanced Security**: This improves isolation and security by preventing worker task code from directly accessing or modifying the Airflow metadata database. Note that Dag author code potentially still executes with direct database access in the Dag File Processor and Triggerer — see :doc:`/security/security_model` for details.
- **Stable Interface**: The Task SDK provides a stable, forward-compatible interface for accessing Airflow resources without direct database dependencies.

Step 1: Take care of prerequisites
Expand Down
7 changes: 4 additions & 3 deletions airflow-core/docs/public-airflow-interface.rst
Original file line number Diff line number Diff line change
Expand Up @@ -548,9 +548,10 @@ but in Airflow they are not parts of the Public Interface and might change any t
internal implementation detail and you should not assume they will be maintained
in a backwards-compatible way.

**Direct metadata database access from task code is no longer allowed**.
Task code cannot directly access the metadata database to query Dag state, task history,
or Dag runs. Instead, use one of the following alternatives:
**Direct metadata database access from code authored by Dag Authors is no longer allowed**.
The code authored by Dag Authors cannot directly access the metadata database to query Dag state, task history,
or Dag runs — workers communicate exclusively through the Execution API. Instead, use one
of the following alternatives:

* **Task Context**: Use :func:`~airflow.sdk.get_current_context` to access task instance
information and methods like :meth:`~airflow.sdk.types.RuntimeTaskInstanceProtocol.get_dr_count`,
Expand Down
Loading
Loading