Skip to content

feat(docker): add minute-level log retention support to clean-logs.sh#61853

Closed
shreeharshshinde wants to merge 4 commits intoapache:mainfrom
shreeharshshinde:feat/log-retention-minutes
Closed

feat(docker): add minute-level log retention support to clean-logs.sh#61853
shreeharshshinde wants to merge 4 commits intoapache:mainfrom
shreeharshshinde:feat/log-retention-minutes

Conversation

@shreeharshshinde
Copy link
Copy Markdown

What does this PR support ?

Adds support for configuring Docker log cleanup retention in minutes via a new environment variable:

AIRFLOW__LOG_RETENTION_MINUTES

When this variable is set, the log groomer uses find -mmin to delete .log files older than the specified number of minutes. If it is not set, the existing behavior based on AIRFLOW__LOG_RETENTION_DAYS remains unchanged.


Why

Day-level retention granularity can be too coarse for large-scale deployments generating high log volumes. This can result in excessive local log file accumulation, which negatively impacts log exporters such as the OpenTelemetry Collector FileLogReceiver that perform poorly when handling large file counts.

This change enables more precise log cleanup scheduling without breaking existing configurations.


How

  • Introduced AIRFLOW__LOG_RETENTION_MINUTES

  • Added conditional logic in scripts/docker/clean-logs.sh:

    • Use -mmin when minutes retention is provided
    • Fall back to existing -mtime logic otherwise
  • Preserved existing pruning, filtering, and empty directory cleanup behavior


Backward Compatibility

No breaking changes:

  • Existing AIRFLOW__LOG_RETENTION_DAYS continues to work
  • Default behavior unchanged if new variable is unset

Testing

  • Verified behavior locally with test log files
  • Confirmed deletion based on minute-level thresholds
  • Confirmed fallback to day-based retention

Fixes #61814

@boring-cyborg
Copy link
Copy Markdown

boring-cyborg bot commented Feb 13, 2026

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

Comment on lines +34 to +46
if [[ -n "${RETENTION_MINUTES}" ]]; then
echo "Trimming airflow logs older than ${RETENTION_MINUTES} minutes."
find "${DIRECTORY}"/logs \
-type d -name 'lost+found' -prune -o \
-type f -mmin +"${RETENTION_MINUTES}" -name '*.log' -print0 | \
xargs -0 rm -f || true
else
echo "Trimming airflow logs older than ${RETENTION_DAYS} days."
find "${DIRECTORY}"/logs \
-type d -name 'lost+found' -prune -o \
-type f -mtime +"${RETENTION_DAYS}" -name '*.log' -print0 | \
xargs -0 rm -f || true
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it is good practice to just ignore the AIRFLOW__LOG_RETENTION_DAYS env when AIRFLOW__LOG_RETENTION_MINUTES is set. This behavior might confuse users.

Hmm, how about combine AIRFLOW__LOG_RETENTION_DAYS and AIRFLOW__LOG_RETENTION_MINUTES? What I mean with that is this:
AIRFLOW__LOG_RETENTION_DAYS * 24 * 60 + AIRFLOW__LOG_RETENTION_MINUTES?

With that, it would not be necessary to do an if-else statement, it would be enough to just use -mmin instead in general.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense, combining both values into a single minute-based retention avoids silent overrides and simplifies the logic.

readonly DIRECTORY="${AIRFLOW_HOME:-/usr/local/airflow}"
readonly RETENTION="${AIRFLOW__LOG_RETENTION_DAYS:-15}"
readonly RETENTION_DAYS="${AIRFLOW__LOG_RETENTION_DAYS:-15}"
readonly RETENTION_MINUTES="${AIRFLOW__LOG_RETENTION_MINUTES:-}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition of a new environment variable needs to be documented somewhere? Can somebody help with where this needs to be documented?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I’m still getting familiar with the Airflow docs structure, could you please point me to the preferred location for documenting Docker image environment variables?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I’m still getting familiar with the Airflow docs structure, could you please point me to the preferred location for documenting Docker image environment variables?

I'm afraid to disappoint you there, I don't know it either currently. I am not really familiar with the Airflow docs as well

Copy link
Copy Markdown
Contributor

@n-badtke-cg n-badtke-cg Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I’m still getting familiar with the Airflow docs structure, could you please point me to the preferred location for documenting Docker image environment variables?

A quick code search for "log_retention_days" in the repo gave me this: https://github.com/search?q=repo%3Aapache%2Fairflow+log_retention_days&type=code

@shreeharshshinde
Copy link
Copy Markdown
Author

I have updated the logic of retention period and added about retention_minutes where retention_days were added, hope it is right place to add about new environment variable

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shreeharshshinde
Copy link
Copy Markdown
Author

Along with schema I have added same logic in dockerfile also

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that the Dockerfile is getting autogenerated with scripts?
https://github.com/apache/airflow/blob/main/Dockerfile#L1634

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so should I remove that ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so should I remove that ?

I don't know, here we need more experienced person to give us the information

}
}
}
} No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing new line with EOF

Updated the description for retentionMinutes to clarify its relation to retentionDays.
},
"retentionMinutes": {
"description": "Number of minutes to retain the logs when running the Airflow log groomer sidecar.",
"description": "Total retention time is retentionDays + retentionMinutes.",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a mishap with this commit? ^^ you replaced the description instead of adding :)

@potiuk
Copy link
Copy Markdown
Member

potiuk commented Feb 14, 2026

There are lots of issues here. If you want to add minute retetion, you should likely change the extisting values - not to add more parameters.

@potiuk potiuk closed this Feb 14, 2026
@n-badtke-cg
Copy link
Copy Markdown
Contributor

There are lots of issues here. If you want to add minute retetion, you should likely change the extisting values - not to add more parameters.

What about backwards compatibility?

@potiuk potiuk reopened this Feb 17, 2026
@pugarte7
Copy link
Copy Markdown
Contributor

Hi! I have the fixes ready so we can merge this. What should be the approach? Can I commit to this PR? Maybe creating a new PR coauthoring @shreeharshshinde ?

@n-badtke-cg
Copy link
Copy Markdown
Contributor

n-badtke-cg commented Mar 2, 2026

Hi! I have the fixes ready so we can merge this. What should be the approach? Can I commit to this PR? Maybe creating a new PR coauthoring @shreeharshshinde ?

@Wastelander777 Since this is a PR from a fork of @shreeharshshinde, you will not be able to commit to this PR unless he is adding you to his fork repo.
You might either clone the fork of @shreeharshshinde and cherry pick the changes of the branch or you might fork the fork and append the branch ^^

# Number of minutes to retain logs.
# This can be used for finer granularity than days.
# Total retention is retentionDays + retentionMinutes.
retentionMinutes: 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered instead of adding another parameter (plus all the complexity with this) to support fractions, e.g. retentionDays: 15.5 to make it every 15 days plus 12h? Or 0.0139 to have it every 20 minutes.

Copy link
Copy Markdown
Contributor

@n-badtke-cg n-badtke-cg Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have thought on that - and scraped the idea immediately :) Usability is too bad imo

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @n-badtke-cg, I think at the user level is harder to define the retention that way.

Copy link
Copy Markdown
Contributor

@n-badtke-cg n-badtke-cg Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: we need to calculate minutes anyway because the -mtime only takes integers, so we cannot use fractional days with -mtime, we need to use -mmin anyway. Then, we need to catch weird calculations, because -mmin also takes integers only.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also an example: 10 minutes.

fractional days = 10min / (24hours * 60min) = 0,00694444444444444444444444444444...

@potiuk
Copy link
Copy Markdown
Member

potiuk commented Mar 10, 2026

@shreeharshshinde This PR has been converted to draft because it does not yet meet our Pull Request quality criteria.

Issues found:

  • Pre-commit / static checks: Failing: CI image checks / Static checks. Run prek run --from-ref main locally to find and fix issues. See Pre-commit / static checks docs.
  • mypy (type checking): Failing: CI image checks / MyPy checks (mypy-providers). Run prek --stage manual mypy-providers --all-files locally to reproduce. You need breeze ci-image build --python 3.10 for Docker-based mypy. See mypy (type checking) docs.
  • ⚠️ Unresolved review comments: This PR has 1 unresolved review thread from maintainers. Please review and resolve all inline review comments before requesting another review. You can resolve a conversation by clicking 'Resolve conversation' on each thread after addressing the feedback. See pull request guidelines.

Note: Your branch is 673 commits behind main. Some check failures may be caused by changes in the base branch rather than by your PR. Please rebase your branch and push again to get up-to-date CI results.

What to do next:

  • The comment informs you what you need to do.
  • Fix each issue, then mark the PR as "Ready for review" in the GitHub UI - but only after making sure that all the issues are fixed.
  • Maintainers will then proceed with a normal review.

Converting a PR to draft is not a rejection — it is an invitation to bring the PR up to the project's standards so that maintainer review time is spent productively. If you have questions, feel free to ask on the Airflow Slack.

@potiuk potiuk marked this pull request as draft March 10, 2026 23:14
@n-badtke-cg
Copy link
Copy Markdown
Contributor

This PR fell behind PR #61855

@n-badtke-cg
Copy link
Copy Markdown
Contributor

n-badtke-cg commented Mar 12, 2026

This PR is obsolete now since PR #61855 got merged

@jscheffl jscheffl closed this Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Container Log Groomer should be configurable with a log retention time <1d - OTEL Collector FileLogReceiver

5 participants