Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

appender: prune old files at startup #2966

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kaffarell
Copy link
Contributor

@kaffarell kaffarell commented May 10, 2024

The prune_old_logs function only gets called when we rollover the appender. The problem is that at startup we create an initial log file without rolling over, if e.g., the executable then exits before rolling over for the first time, we never call prune_old_logs.

E.g.: If we have a tracing-appender that rolls over every minute and we have a max_files parameter of 2. Running the program (which prints a single log line and exits) every minute, will result in a new log file everytime without deleting the old ones.

Fixes: #2937

The prune_old_logs function only gets called when we rollover the
appender. The problem is that at startup we create an initial log file
without rolling over, if e.g., the executable then exits before rolling
over for the first time, we never call prune_old_logs.

E.g.: If we have a tracing-appender that rolls over every minute and we
have a max_files parameter of 2. Running the program (which prints a
single log line and exits) every minute, will result in a new log file
everytime without deleting the old ones.

Fixes: tokio-rs#2937
@kaffarell kaffarell requested a review from a team as a code owner May 10, 2024 15:36
Copy link

@nfagerlund nfagerlund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah cool, I've got an app that's affected by this bug — it's behind a scheduling proxy that starts and kills it according to demand, and it never gets to live for an entire log rollover cycle.

User expectations and correctness

So here's the range of cases, with problematic or unexpected behaviors in bold. "Long" lifetime means the app lives for many rotation periods in a row, "short" means it lives for less than a full rotation period.

App lifetime First startup in a Rotation Subsequent startups in same Rotation Awake during rollover
Long (current) Files can grow to max + 1 n/a Files pruned to max
Short (current) Files can grow unbounded Files unchanged n/a
Long (this PR) Files pruned to max n/a Files pruned to max
Short (this PR) Files pruned to max Files pruned to max - 1 n/a

This is because:

  • Current behavior never prunes unless a rollover hits while the app is running.
  • This PR always prunes on startup.
  • The prune_old_logs method unconditionally ensures that there's one fewer than max files, because it assumes it's being guarded by a "should roll over" check and we're about to create a new file and go back to max. (But for an app that might start and die many times within the same Rotation period, that assumption is only correct for the first startup within a given Rotation.)

On consideration, I think it's fine to occasionally prune to max - 1 — after all, it's a maximum, not a guaranteed count. However, this needs an edit to documentation so users can calibrate their expectations — if you must retain N days of logs, then max = N + 1!

Performance concerns

Pruning in Inner::new() will technically work, but it increases the amount of blocking filesystem work that has to be done when creating the writer.

What do you think about deferring the initial prune until it performs its first write after creation? That way, blocking-sensitive applications (like an edge function that needs fast cold startup time) can still take advantage of this crate's NonBlockingWriter to push that pruning cost off the critical path.

It looks like everything's already set up to do that properly, IF Inner::should_rollover happens to return an affirmative on the first check. I think you can provoke this by changing line 531 of Inner::new() to be let next_date = rotation.round_date(&now); — the inner will still start out with the correct writer filename, but refresh_writer will get called on the first write and will immediately re-open that correct file after doing a prune.

This of course will need a comment in the code to explain why the "next" date is starting out in the past.

@kaffarell
Copy link
Contributor Author

kaffarell commented May 28, 2024

Thanks for this thorough writeup!

On consideration, I think it's fine to occasionally prune to max - 1 — after all, it's a maximum, not a guaranteed count. However, this needs an edit to documentation so users can calibrate their expectations — if you must retain N days of logs, then max = N + 1!

This is kind of where my approach has a flaw. Because the prune_old_logs function expects to be called before refresh_writer (which creates a new file) it truncates the files to n-1 (as it is written in a comment in the function). BUT as you said, if we document this I would be fine with it.

It looks like everything's already set up to do that properly, IF Inner::should_rollover happens to return an affirmative on the first check. I think you can provoke this by changing line 531 of Inner::new() to be let next_date = rotation.round_date(&now); — the inner will still start out with the correct writer filename, but refresh_writer will get called on the first write and will immediately re-open that correct file after doing a prune.

This will call create_writer twice and do this (for two log lines):

create_writer
write
should rollover: yes
refresh_writer
prune_old_logs
create_writer
write
should rollover: no

instead of this:

prune_old_logs
create_writer
write
should rollover: no
write
should rollover: no

And for a max_log_files of 2 it will never create more than 1 file.

@nfagerlund
Copy link

Yeah, gotta admit the extra thrash from doing create_writer twice on the first write isn't ideal. 🤨 I feel like there's got to be an elegant design for this, but ah well.

At any rate, I'd accept "purge on startup" as long as it was accompanied by a docs change. Any performance hit is likely negligible.

@kaffarell
Copy link
Contributor Author

@nfagerlund Added a commit with the documentation update. I don't really like my explanation and wording, so please correct me!

Copy link

@nfagerlund nfagerlund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure thing 👍🏼

Comment on lines 192 to 197
/// When a new log file is created, if there are `n` or more
/// existing log files in the directory, the oldest will be deleted.
/// At startup, the `RollingFileAppender` will also prune files that are
/// older than the first `n`, so that without rollover we can still guarantee
/// less than `n` files.
/// If no value is supplied, the `RollingAppender` will not remove any files.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// When a new log file is created, if there are `n` or more
/// existing log files in the directory, the oldest will be deleted.
/// At startup, the `RollingFileAppender` will also prune files that are
/// older than the first `n`, so that without rollover we can still guarantee
/// less than `n` files.
/// If no value is supplied, the `RollingAppender` will not remove any files.
/// When constructing a `RollingFileAppender` or starting a new log file,
/// the appender will delete the oldest matching log files until at most `n`
/// files remain. The exact number of retained files can sometimes dip below
/// the maximum, so if you need to retain `m` log files, specify a max of
/// `m + 1`.
///
/// If no value is supplied, the `RollingAppender` will not remove any files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

max_log_files can‘ work
2 participants