Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename DefaultTarget to Target #1674

Closed
wants to merge 1 commit into from
Closed

Conversation

mattt
Copy link
Member

@mattt mattt commented May 20, 2024

Follow-up to #1672

Both max and target concurrency levels can be overridden by web, so in effect both are default values.

Signed-off-by: Mattt Zmuda <mattt@replicate.com>
@mattt mattt requested a review from technillogue May 20, 2024 17:18
@technillogue
Copy link
Contributor

technillogue commented May 20, 2024

max concurrency can't overriden in web though, cog currently reads it from the cog.yaml file inside the container. I could restore the COG_CONCURRENCY_OVERRIDE environment variable (currently removed) and plumb setting it through web and cluster, but that's not currently there and it's not obvious that would be good (it certainly can't be increased past the maximum with which the engine was built). on the other hand, target is purely a consideration for the autoscaler and doesn't need to be read from inside the container

@mattt
Copy link
Member Author

mattt commented May 20, 2024

How about the value sent to director from cluster? I can imagine situations where it'd be nice to lower an ambitious max value for a model. For example, when running on a different hardware.

@technillogue
Copy link
Contributor

there are some occasional models that could potentially run on different hardware (not triton), but what value does director send to cluster...?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants