-
Notifications
You must be signed in to change notification settings - Fork 43.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(forge/llm): Add LlamafileProvider
#7091
base: master
Are you sure you want to change the base?
feat(forge/llm): Add LlamafileProvider
#7091
Conversation
β¦der for llamafiles. Currently it just extends OpenAIProvider and only overrides methods that are necessary to get the system to work at a basic level. Update ModelProviderName schema and config/configurator so that app startup using this provider is handled correctly. Add 'mistral-7b-instruct-v0' to OpenAIModelName/OPEN_AI_CHAT_MODELS registries.
β¦-Instruct chat template, which supports the 'user' & 'assistant' roles but does not support the 'system' role.
β¦kens`, and `get_tokenizer` from classmethods so I can override them in LlamafileProvide (and so I can access instance instance attributes from inside them). Implement class `LlamafileTokenizer` that calls the llamafile server's `/tokenize` API endpoint.
β¦tes on the integration; add helper scripts for downloading/running a llamafile + example env file.
β¦gs for reproducibility
β¦ange serve.sh to use model's full context size (this does not seem to cause OOM errors, surpisingly).
β Deploy Preview for auto-gpt-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request. |
@CodiumAI-Agent /review |
PR Review
Code feedback:
β¨ Review tool usage guide:Overview: The tool can be triggered automatically every time a new PR is opened, or can be invoked manually by commenting on any PR.
See the review usage page for a comprehensive guide on using this tool. |
@k8si any chance you could enable maintainer write access on this PR? |
@Pwuts it doesn't look like I have the ability to do that. I added you as a maintainer to the forked project, is that sufficient or do others need write access? Alternatively, you could branch off my branch and I can just accept the changes via PR? |
Conflicts have been resolved! π A maintainer will review the pull request shortly. |
β¦vider`, `GroqProvider` and `LlamafileProvider` and rebase the latter three on `BaseOpenAIProvider`
Conflicts have been resolved! π A maintainer will review the pull request shortly. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7091 +/- ##
==========================================
+ Coverage 35.01% 42.38% +7.36%
==========================================
Files 18 84 +66
Lines 1211 4889 +3678
Branches 179 676 +497
==========================================
+ Hits 424 2072 +1648
- Misses 758 2724 +1966
- Partials 29 93 +64
Flags with carried forward coverage won't be shown. Click here to find out more. β View full report in Codecov by Sentry. |
Conflicts have been resolved! π A maintainer will review the pull request shortly. |
This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request. |
Conflicts have been resolved! π A maintainer will review the pull request shortly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general a solid step towards using local llms!
Some comments should be changed to docstrings and PR adds many todos/fixmes.
# Clean up model names: | ||
# 1. Remove file extension | ||
# 2. Remove quantization info | ||
# e.g. 'mistral-7b-instruct-v0.2.Q5_K_M.gguf' | ||
# -> 'mistral-7b-instruct-v0.2' | ||
# e.g. '/Users/kate/models/mistral-7b-instruct-v0.2.Q5_K_M.gguf' | ||
# -> 'mistral-7b-instruct-v0.2' | ||
# e.g. 'llava-v1.5-7b-q4.gguf' | ||
# -> 'llava-v1.5-7b' | ||
def clean_model_name(model_file: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that the explanation is added here!
nit: we use """comment"""
just below the member:
# Clean up model names: | |
# 1. Remove file extension | |
# 2. Remove quantization info | |
# e.g. 'mistral-7b-instruct-v0.2.Q5_K_M.gguf' | |
# -> 'mistral-7b-instruct-v0.2' | |
# e.g. '/Users/kate/models/mistral-7b-instruct-v0.2.Q5_K_M.gguf' | |
# -> 'mistral-7b-instruct-v0.2' | |
# e.g. 'llava-v1.5-7b-q4.gguf' | |
# -> 'llava-v1.5-7b' | |
def clean_model_name(model_file: str) -> str: | |
def clean_model_name(model_file: str) -> str: | |
"""Clean up model names: | |
1. Remove file extension | |
2. Remove quantization info | |
e.g. 'mistral-7b-instruct-v0.2.Q5_K_M.gguf' | |
-> 'mistral-7b-instruct-v0.2' | |
e.g. '/Users/kate/models/mistral-7b-instruct-v0.2.Q5_K_M.gguf' | |
-> 'mistral-7b-instruct-v0.2' | |
e.g. 'llava-v1.5-7b-q4.gguf' | |
-> 'llava-v1.5-7b'""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is just as example then fine but otherwise LLAMAFILE
could be provided as an argument.
def _adapt_chat_messages_for_mistral_instruct( | ||
self, messages: list[ChatCompletionMessageParam] | ||
) -> list[ChatCompletionMessageParam]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future: message format could be abstracted/handled by ModelProvided
and assigned to each model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean?
Doesn't seem to work for me when using ./scripts/llamafile/serve.py PS C:\Users\nicka\code\AutoGPTNew\autogpt> python3 .\scripts\llamafile\serve.py
Downloading mistral-7b-instruct-v0.2.Q5_K_M.llamafile.exe...
Downloading: [########################################] 100% - 5166.9/5166.9 MB
Traceback (most recent call last):
File "C:\Users\nicka\code\AutoGPTNew\autogpt\scripts\llamafile\serve.py", line 56, in <module>
download_llamafile()
File "C:\Users\nicka\code\AutoGPTNew\autogpt\scripts\llamafile\serve.py", line 43, in download_llamafile
subprocess.run([LLAMAFILE, "--version"], check=True)
File "C:\Users\nicka\.pyenv\pyenv-win\versions\3.11.7\Lib\subprocess.py", line 548, in run
with Popen(*popenargs, **kwargs) as process:
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\nicka\.pyenv\pyenv-win\versions\3.11.7\Lib\subprocess.py", line 1026, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\nicka\.pyenv\pyenv-win\versions\3.11.7\Lib\subprocess.py", line 1538, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [WinError 193] %1 is not a valid Win32 application Important sidenote: that binary doesn't work to be executed so python of course fails |
I also get this after using the workaround above
################################################################################
### AutoGPT - GENERAL SETTINGS
################################################################################
## OPENAI_API_KEY - OpenAI API Key (Example: sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
# OPENAI_API_KEY=
## ANTHROPIC_API_KEY - Anthropic API Key (Example: sk-ant-api03-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
# ANTHROPIC_API_KEY=
## GROQ_API_KEY - Groq API Key (Example: gsk_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
# GROQ_API_KEY=
################################################################################
### LLM MODELS
################################################################################
## SMART_LLM - Smart language model (Default: gpt-4-turbo)
SMART_LLM=mistral-7b-instruct-v0.2
## FAST_LLM - Fast language model (Default: gpt-3.5-turbo)
FAST_LLM=mistral-7b-instruct-v0.2
## EMBEDDING_MODEL - Model to use for creating embeddings
# EMBEDDING_MODEL=text-embedding-3-small
|
Background
This draft PR is a step toward enabling the use of local models in AutoGPT by adding llamafile as an LLM provider.
Implementation notes are included in
forge/forge/llm/providers/llamafile/README.md
Related issues:
Depends on:
BaseOpenAIProvider
-> deduplicateGroqProvider
&OpenAIProvider
Β #7178Changes ποΈ
Add minimal implementation of
LlamafileProvider
, a newChatModelProvider
for llamafiles. It extendsBaseOpenAIProvider
and only overrides methods that are necessary to get the system to work at a basic level.Add support for
mistral-7b-instruct-v0.2
. This is the only model currently supported byLlamafileProvider
because this is the only model I tested anything with.Misc changes to app configuration to enable switching between openai/llamafile providers. In particular, added config fieldLLM_PROVIDER
that, when set to 'llamafile', will useLllamafileProvider
in agents rather thanOpenAIProvider
.Add instructions to use AutoGPT with llamafile in the docs at
autogpt/setup/index.md
Limitations:
PR Quality Scorecard β¨
+2 pts
+5 pts
+5 pts
+5 pts
-4 pts
+4 pts
+5 pts
-5 pts
agbenchmark
to verify that these changes do not regress performance? β+10 pts