Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/embeddings endpoint sometimes does not return embedding #7277

Closed
marcingomulkiewicz opened this issue May 14, 2024 · 4 comments · Fixed by #7389
Closed

/embeddings endpoint sometimes does not return embedding #7277

marcingomulkiewicz opened this issue May 14, 2024 · 4 comments · Fixed by #7389

Comments

@marcingomulkiewicz
Copy link

marcingomulkiewicz commented May 14, 2024

Llama.cpp version: b2876, but the bug existed at least a few releases back.

Environments: checked on 2 (same behaviour):

  • Linux Mint current, 5800X3D, RTX 4090, built with cmake .. -DLLAMA_CUDA=ON -DLLAMA_SERVER_SSL=ON; run as server --embeddings -m mistral-7b-instruct-v0.2.Q4_K_M.gguf ,
  • MacOS current, M1 Ultra, build with make (no parameters), run similarily: server --embeddings -m mistral-7b-instruct-v0.2.Q4_K_M.gguf

Expected behaviour: /embeddings endpoint always returns embedding.
Observed behaviour: when embedding content is greater than context length/ubatch, then instead of embedding, seems that original request is being returned. Probably some error message should be returned.

Sample:

curl -X POST "http://localhost:8080/embedding" --data '{"content":"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam rhoncus mauris eget magna semper, ut varius arcu eleifend. Vestibulum quis justo eget ex pretium sollicitudin. Nam euismod orci vulputate erat sagittis, sed pulvinar ante varius. Proin in dui non eros sodales tempus. Proin et mi scelerisque tellus eleifend auctor. Sed sagittis erat sapien, in porttitor augue bibendum nec. Nam ut mi accumsan lorem volutpat tempus. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. In ac nulla tempor, pharetra felis id, venenatis tortor. Donec felis turpis, egestas non ligula at, eleifend fringilla est. Fusce elit mi, fermentum a sapien eleifend, rutrum scelerisque eros. Sed et vestibulum orci. Quisque ut magna vel nibh accumsan dictum eget eu urna. Duis rhoncus, lacus in imperdiet tincidunt, turpis turpis vestibulum ante, at mollis nisi massa et purus. Phasellus sed ante eros. Aenean consequat nisi non massa eleifend finibus. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce ultrices libero id metus consequat semper. Nam venenatis, est quis interdum commodo, nisl ex placerat diam, sed fringilla ex nisi sed sem. Pellentesque luctus orci id tellus dictum tristique. Integer molestie varius risus quis maximus. In id feugiat nulla, at scelerisque massa. Nulla neque diam, consequat ac orci laoreet, venenatis pharetra enim.Aenean rhoncus dapibus augue ac volutpat. Nullam laoreet, lorem quis fermentum scelerisque"}'

works correctly.

curl -X POST "http://localhost:8080/embedding" --data '{"content":"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam rhoncus mauris eget magna semper, ut varius arcu eleifend. Vestibulum quis justo eget ex pretium sollicitudin. Nam euismod orci vulputate erat sagittis, sed pulvinar ante varius. Proin in dui non eros sodales tempus. Proin et mi scelerisque tellus eleifend auctor. Sed sagittis erat sapien, in porttitor augue bibendum nec. Nam ut mi accumsan lorem volutpat tempus. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. In ac nulla tempor, pharetra felis id, venenatis tortor. Donec felis turpis, egestas non ligula at, eleifend fringilla est. Fusce elit mi, fermentum a sapien eleifend, rutrum scelerisque eros. Sed et vestibulum orci. Quisque ut magna vel nibh accumsan dictum eget eu urna. Duis rhoncus, lacus in imperdiet tincidunt, turpis turpis vestibulum ante, at mollis nisi massa et purus. Phasellus sed ante eros. Aenean consequat nisi non massa eleifend finibus. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce ultrices libero id metus consequat semper. Nam venenatis, est quis interdum commodo, nisl ex placerat diam, sed fringilla ex nisi sed sem. Pellentesque luctus orci id tellus dictum tristique. Integer molestie varius risus quis maximus. In id feugiat nulla, at scelerisque massa. Nulla neque diam, consequat ac orci laoreet, venenatis pharetra enim.Aenean rhoncus dapibus augue ac volutpat. Nullam laoreet, lorem quis fermentum scelerisque,"}'

fails (note one more character at the end, a comma - it could be anything).

Update 1: /v1/embeddings endpoint behaves in the same way.
Update 2: When embed is called from java-llama, it behaves similarly, works for the first sample input, for the second it fails with:
terminate called after throwing an instance of 'nlohmann::json_abi_v3_11_3::detail::type_error' what(): [json.exception.type_error.302] type must be array, but is null
Update 3: Up until b2356 the endpoint returns embeddings, though incorrect (all zeroes). From b2357 the bug seems present.

@barsuna
Copy link

barsuna commented May 18, 2024

I see this also on 2828
main --version
version: 2828 (fd9f92b)

For sequences up to 127 tokens it works, from 128 and more - fails

Update: it actually works up until the 'logical maximum batch size' (-b parameter) minus 1, and fails for larger amount of tokens...

@ggerganov
Copy link
Owner

The text is larger than than n_ubatch - you need to increase it:

if (slot.embedding) {
// this prompt is too large to process - discard it
if (slot.n_prompt_tokens > n_ubatch) {
slot.state = SLOT_STATE_PROCESSING;
slot.command = SLOT_COMMAND_NONE;
slot.release();
slot.print_timings();
send_final_response(slot);
continue;
}
} else {

Try adding -c 1024 -nb 1024 to your server command

@marcingomulkiewicz
Copy link
Author

Correct, increasing batch size does the trick, but shouldn't the server return some sort of error/warning message? We can ofc assume that not getting embeddings back is a sign of error, but someone e.g. talking to remote llama.cpp server and not familiar with this thread will not even know what to ask about / what was the cause / what to do to remedy it, etc.

@ggerganov
Copy link
Owner

Yes, PR #7389 returns an error in such cases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants