As part of AIP-72, we want to pass the Task Instance to the worker. Currently, the primary key of TI is a combination of dag_id, task_id, run_id, map_index.
|
__tablename__ = "task_instance" |
|
task_id = Column(StringID(), primary_key=True, nullable=False) |
|
dag_id = Column(StringID(), primary_key=True, nullable=False) |
|
run_id = Column(StringID(), primary_key=True, nullable=False) |
|
map_index = Column(Integer, primary_key=True, nullable=False, server_default=text("-1")) |
Instead of sending the entire key from the executor to worker via API-server, ideally the API server can just send over a TI UUID and the worker then uses it to fetch the correct TI to execute.
We want to add a single column pk of a UUID, and should use UUID v7 (as it has better temporal sorting behaviours than the random v4). For the migration to update existing rows we can use v4 which most DBs have natively.
The scope of this GitHub issue is to add UUIDs -- but not use it anywhere in the codebase yet until we need it on the Task Execution API server. We will keep the "denormalized" columns of dag_id and run_id for easier searching/querying.
As part of AIP-72, we want to pass the Task Instance to the worker. Currently, the primary key of TI is a combination of
dag_id, task_id, run_id, map_index.airflow/airflow/models/taskinstance.py
Lines 1815 to 1819 in b4269f3
Instead of sending the entire key from the executor to worker via API-server, ideally the API server can just send over a TI UUID and the worker then uses it to fetch the correct TI to execute.
We want to add a single column pk of a UUID, and should use UUID v7 (as it has better temporal sorting behaviours than the random v4). For the migration to update existing rows we can use v4 which most DBs have natively.
The scope of this GitHub issue is to add UUIDs -- but not use it anywhere in the codebase yet until we need it on the Task Execution API server. We will keep the "denormalized" columns of dag_id and run_id for easier searching/querying.