Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try FMBench with instance count set to > 1 to see how scaling impacts latency and transactions per minute #29

Open
aarora79 opened this issue Feb 17, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@aarora79
Copy link
Contributor

It would be interesting to see the effect of scaling to multiple instances behind the same endpoint. How does inference latency change as endpoints start to scale (automatically, we could also add parameters for scaling policy), can we support the more transactions with auto-scaling instances while keeping the latency below a threshold and what are the cost implications of doing that. This needs to be fleshed out but this is an interesting area.

This would also need to include support for the Inference Configuration feature that is now available with SageMaker.

@aarora79 aarora79 added the enhancement New feature or request label Feb 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant