Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add minimal python client example for the server, streaming callback #7373

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

chrismrutherford
Copy link

When integrating llama.cpp with web services I use the server for inference. I find the server is fast and efficient using this method as the client is more or less pass-through.

There doesn't seem to be a good set of python examples for the server, possibly because most people use the openai client library? I was using this, but found it difficult to pass llama.cpp specific parameters such as cache_prompt, also it required that it required the model parameter to be present, even though it is not used by the server.

Here is a minimal python client that supports streaming. There is a streaming callback function that receives the data, here you can do things like check for "." for building sentences and passing them to a voice server, minimizing latency.

@github-actions github-actions bot added examples python python script changes server labels May 18, 2024
@scottmudge
Copy link

Cool, was looking for something exactly like this.

@shigins
Copy link

shigins commented May 19, 2024

Thank you for this, the streaming example helped me out.

@mofosyne mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label May 20, 2024
@teleprint-me
Copy link
Contributor

Looks good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples python python script changes Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants