Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide reachability info with describe task queue for the new versioning API (Backward incompatible) #565

Conversation

antlai-temporal
Copy link
Contributor

What was changed

Provides an enhanced mode (by default) to temporal task-queue describe that provides task reachability information
for Build IDs. The goal is to facilitate retiring workers with old code that use the new worker versioning API.

The old behavior of describe is available using the legacy-mode flag.

@antlai-temporal antlai-temporal requested review from ShahabT, carlydf and a team May 16, 2024 05:04
"go.temporal.io/server/common/tqid"
)

const unversioned = "UNVERSIONED"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This package is shared by many commands. May want to make the top-level constructs clearly specific to task queue work. For some of the functions this can maybe be easier putting them as methods on the command instead of top-level functions, but up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

}

// json output
return cctx.Printer.PrintStructured(descRows, printer.StructuredOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be clearer to just print the proto describe response when in JSON mode. The printer is already prepared to properly print protos as JSON. No need for a bunch of custom JSON objects. And we usually find the protos are stable and easy to point users to if they want to see which fields would be present in their JSON.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talking with @ShahabT there are some things that we don't want to include, like WorkerVersionCapabilities that are mostly redundant, and sometimes expose Build IDs derived from code hashes that could be confusing.
Also, I thought that a "row friendly" layout could be easier if they want to add results to a traditional DB, and more compatible with the text view...

Copy link
Member

@cretz cretz May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our HTTP API is going to return that other JSON, it's not a big deal and it should be stable. But I don't mind if we want this separate JSON format for just this call, I have no strong opinion.

temporalcli/commands.taskqueue.go Outdated Show resolved Hide resolved
temporalcli/commands.taskqueue.go Outdated Show resolved Hide resolved
Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make it clear in the description (and probably summary/title) that is this is a backwards incompatible change so that the person doing the release will know to call it out?

@ShahabT
Copy link

ShahabT commented May 21, 2024

Reachability could be an expensive operation and many time the callers may only be interested in pollers only. I think we should make it disabled by default and only report it when user sets a flag such as --report-reachability or similar.

@ShahabT
Copy link

ShahabT commented May 21, 2024

Also similarly, one may not want poller info so ideally there should be a way to disable that such as --report-pollers=false.

Note that a new section will be added soon (BacklogInfo). That should also have a flag (maybe true by default).

@antlai-temporal
Copy link
Contributor Author

Reachability could be an expensive operation and many time the callers may only be interested in pollers only. I think we should make it disabled by default and only report it when user sets a flag such as --report-reachability or similar.

I was thinking whether to export the flags or not. I thought it was a bit confusing not providing the info by default, and this is the cli, so I'm not expecting lots of calls using it, and create a real overhead problem. I'm assuming that most demanding users will use the SDK directly, and they can apply there the flags. Also, even with report off it provides info in json mode (e.g., similar to no pollers found), something that is confusing.

Maybe we can revisit after we get more experience with the pre-release?

@cretz
Copy link
Member

cretz commented May 21, 2024

👍 Would like to know how expensive. It is a common use case for people to poll reachability I assume so you know when you can shut a worker down?

@ShahabT
Copy link

ShahabT commented May 21, 2024

The expense comes from the fact that it needs to make up to 2 Visibility queries per build ID. Although those queries will not be needed for the "default" build ID (when no --select-* flag is used). We do caching of the Visibility queries in server, so maybe at the end of the day it is OK. We plan to do some more optimizations to reduce the visibility query cost.

I'm fine with keeping this as is and revisit after prerelease if we are OK adding the flags (and change default behavior) at a later stage if needed.

@cretz
Copy link
Member

cretz commented May 21, 2024

I'm fine with keeping this as is and revisit after prerelease if we are OK adding the flags (and change default behavior) at a later stage if needed.

Hrmm, I'm not sure we are ok w/ removing this from default and making a flag later. I noticed in the pending Go SDK update we require users to opt-in to this data as well. @antlai-temporal - if this data is not available by default in SDK or API and has an expense, can we put it behind a flag here too?

@antlai-temporal
Copy link
Contributor Author

Hrmm, I'm not sure we are ok w/ removing this from default and making a flag later. I noticed in the pending Go SDK update >we require users to opt-in to this data as well. @antlai-temporal - if this data is not available by default in SDK or API and has >an expense, can we put it behind a flag here too?

There is caching and rate limiting built-in @ShahabT mentioned, and I'm not expecting cli users to be doing heavy stuff. More advanced users will use the SDK, where there is by default more control.

I really hate it when you issue a command in the cli, and nothing useful comes back, and you need to start digging with the options to get anything sensible... I prefer to get the info back by default, and if we need to save, have a disable flag. But instead of falling for premature optimization, I wanted to wait and see whether the flag was actually needed...

@cretz
Copy link
Member

cretz commented May 22, 2024

nothing useful comes back

But something useful is coming back by default (poller info), just not this. Many users don't need reachability anyways because they are not using versioning.

@antlai-temporal
Copy link
Contributor Author

But something useful is coming back by default (poller info), just not this. Many users don't need reachability anyways because they are not using versioning.

In the SDK poller info is also gated by a flag, so if we do the same nothing would be returned by the cli by default. We could just add the enableReachability flag and get rid of the enablePollerInfo flag in the cli...

@antlai-temporal antlai-temporal changed the title Provide reachability info with describe task queue for the new versioning API Provide reachability info with describe task queue for the new versioning API (Backward incompatible) May 23, 2024
@cretz
Copy link
Member

cretz commented May 23, 2024

We could just add the enableReachability flag and get rid of the enablePollerInfo flag in the cli...

Yes, I vote this. Lets show the cheap valuable-to-everyone poller info by default and the more expensive valuable-to-versioning-users-only reachability info a flag is set.

@antlai-temporal
Copy link
Contributor Author

Adding report-reachability flag. Note that there is a bug in the current server version that makes a new unit test fail. This will be fixed upgrading the server hash in the next "NOT FOR MERGE" commit in the versioning-2 branch.

@antlai-temporal antlai-temporal merged commit ef6d391 into temporalio:versioning-2 May 24, 2024
1 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants