Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: remove status field in favor of cancel and invalidation fields #10519

Closed
wants to merge 3 commits into from

Conversation

pmwhite
Copy link
Collaborator

@pmwhite pmwhite commented May 13, 2024

Based on #10518

Moves the cancellation token out of the status field, and then removes the status field, since any information that it provides is already present elsewhere.

The accumulation of the invalidation is orthogonal to what the build
status is. Regardless of how the build progresses, we should always
continually combine new build input change into the current
invalidation, and we should only set the invalidation to empty when we
also call Memo.reset with the current invalidation.

At the same time, it also makes sense to cleanup the handling of the
"build_input_change" ivar. There is no need for it to be an option that
is initialized on-demand, since it is easy and cheap to just create an
initial ivar. In addition, we shouldn't care what the current build
status is when waiting for a file to change. (I suspect that some of the
reason for the complication is due to the "--react-to-insignificant-file changes"
flag that no longer exists.

This patch doesn't claim to fix any bugs; it's just intended as a code
cleanup.

Signed-Off-By: Philip White <code@trailingwhite.space>
@pmwhite pmwhite marked this pull request as draft May 13, 2024 12:17
It doesn't need to be there because we already have `t.cancel`.

Signed-Off-By: Philip White <code@trailingwhite.space>
@pmwhite pmwhite force-pushed the no-cancel-in-build-status branch from 3fcf9f7 to b9f921e Compare May 13, 2024 12:39
Previous changes have moved all the useful information from the status
field into other fields, such as `cancel` and `invalidation`, so now we
can get rid of the field altogether.

Signed-Off-By: Philip White <code@trailingwhite.space>
@pmwhite pmwhite marked this pull request as ready for review May 13, 2024 17:18
@rgrinberg
Copy link
Member

I wonder if this should go the other way? Why should we have a cancellation token active while we aren't building anything.

@pmwhite
Copy link
Collaborator Author

pmwhite commented May 14, 2024

A subsequent PR is going to remove the status field altogether, since it doesn't add any new information on top of whether the current invalidation is empty and whether the cancellation token has fired or not.

So I guess my proposal is to eliminate the "standing by" state. We should only have two states: "building" and "cancelling build". Which state we are in is determined by whether or not the cancellation token has fired.

Maybe I should just add that commit to this PR instead of having them be separate?

@pmwhite
Copy link
Collaborator Author

pmwhite commented May 14, 2024

To directly answer the question you asked: I agree that we shouldn't have a cancellation token while we are not building, but I want to make it so that there is always a current build (which we might be in the process of cancellation). I think this is a simpler world because there are fewer invalid states to think about.

@rgrinberg
Copy link
Member

but I want to make it so that there is always a current build (which we might be in the process of cancellation). I think this is a simpler world because there are fewer invalid states to think about.

What about in watch mode when we are waiting for notifications? What would be the "current build" in that state?

@pmwhite
Copy link
Collaborator Author

pmwhite commented May 15, 2024

The current build would be the most recent build. I'll admit that's a bit a weird from the user's perspective - it makes more sense to say that if the build system is idling, then it isn't in the middle of a build. But in the implementation, it seems to require less code if we don't worry about the build status.

@rgrinberg
Copy link
Member

It's less code, but I suspect that's because we don't have to think about the case when you're cancelling a build when we're standing by. Given the bugs that we've been running into, I don't think that's the right trade-off.

IMO, we should go the other way and get rid of the cancel field in the scheduler in favor of cancellation being a part of the build state. It might be more code, but it makes it easier to use the cancellation token correctly.

@pmwhite pmwhite changed the title scheduler: remove cancellation token from t.status scheduler: remove status field in favor of cancel and invalidation fields May 16, 2024
@pmwhite
Copy link
Collaborator Author

pmwhite commented May 16, 2024

I just combined this PR with the one after it, since this discussion will be most fruitful in the context of actually removing the status field, instead of merely moving the cancellation token. If I weren't also trying to removing the status field, I think I would probably agree that cancellation might as well go inside the status field.

What's weird to me about having an explicit build status is that it is redundant. The memo invalidation and cancellation token contain enough information to reconstruct the build status. Unless it is inefficient or especially awkward, I generally try to avoid representing the same data twice, since that introduces the possibility for the two representations to get out of sync.

It's less code, but I suspect that's because we don't have to think about the case when you're cancelling a build when we're standing by. Given the bugs that we've been running into, I don't think that's the right trade-off.

I don't think not having to think about a case necessarily implies that the case is more likely to be buggy. To the contrary, I've often found that eliminating redundant information causes bugs to be either impossible or more noticeable (and thus more likely to be caught by tests).

IMO, we should go the other way and get rid of the cancel field in the scheduler in favor of cancellation being a part of the build state. It might be more code, but it makes it easier to use the cancellation token

This would mean we can't get rid of the status field, which was my motivation for these refactors. But I agree that if we do keep the status field, we might as well put the cancellation token there and only there. We can go in that direction instead if I fail to convince you that getting rid of the status field is a good idea.

@rgrinberg
Copy link
Member

So I agree broadly that the duplicating having two ways of representing the same thing is bad in general for the reason that you've mentioned - syncing is quite error prone. So in that sense, your PR is an improvement. However, before we decide what the code should look like, we should agree on the behavior of cancel_current_build when there's no build running.

As far as I can tell, your PR subtly changes the behavior in the following case:

  1. Dune starts in watch mode
  2. Finishes a build iteration
  3. A call to Scheduler.cancel_current_build () is made

Before this PR, the behavior was to effectively to ignore the cancellation in 3.
After this PR, even though the build is finished, we will now fire the callback corresponding to the Build_interrupted event.

I haven't too hard about this, but I think the previous behavior was better. Why do you like the new behavior more?

On a related note, what would you expect the following pseudo to do:

Scheduler.go (fun () ->
  let* () = Scheduler.cancel_current_build () in
  Build_system.run (fun () -> ...))

The current behavior is that cancellation will make the build not even start. IMO, that's confusing and the cancellation should be a no-op or provide an informative return value that no build is active in such a situation. This is why I don't really like always having a cancellation token available.

@pmwhite
Copy link
Collaborator Author

pmwhite commented May 18, 2024

As far as I can tell, your PR subtly changes the behavior in the following case:

Indeed, I think you're right. This was not intentional, though...I agree that the previous behavior is better, so it seems that it's actually NOT true that we can reconstruct status from cancel and invalidation.

This is why I don't really like always having a cancellation token available.

Yeah, I agree that current behavior of your pseudo code is confusing; my understanding is that this confusion exists both before and after this PR. To fix it, I think we need to somehow have Build_system.run setup a new build, instead of only poll_iter being the point where builds are set up.

@pmwhite
Copy link
Collaborator Author

pmwhite commented May 18, 2024

I wrote an alternative to this PR: #10548. This new one seems superior to me for a couple reasons:

  • It doesn't entirely get rid of the status field, since, as you pointed out, doing so would change behavior.
  • It picks apart into separate commits the bits of this PR that are actually fairly independent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants