-
Notifications
You must be signed in to change notification settings - Fork 8.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
/embeddings endpoint sometimes does not return embedding #7277
Comments
I see this also on 2828 For sequences up to 127 tokens it works, from 128 and more - fails Update: it actually works up until the 'logical maximum batch size' (-b parameter) minus 1, and fails for larger amount of tokens... |
The text is larger than than llama.cpp/examples/server/server.cpp Lines 1979 to 1989 in 854d365
Try adding |
Correct, increasing batch size does the trick, but shouldn't the server return some sort of error/warning message? We can ofc assume that not getting embeddings back is a sign of error, but someone e.g. talking to remote llama.cpp server and not familiar with this thread will not even know what to ask about / what was the cause / what to do to remedy it, etc. |
Yes, PR #7389 returns an error in such cases |
Llama.cpp version: b2876, but the bug existed at least a few releases back.
Environments: checked on 2 (same behaviour):
Expected behaviour: /embeddings endpoint always returns embedding.
Observed behaviour: when embedding content is greater than context length/ubatch, then instead of embedding, seems that original request is being returned. Probably some error message should be returned.
Sample:
curl -X POST "http://localhost:8080/embedding" --data '{"content":"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam rhoncus mauris eget magna semper, ut varius arcu eleifend. Vestibulum quis justo eget ex pretium sollicitudin. Nam euismod orci vulputate erat sagittis, sed pulvinar ante varius. Proin in dui non eros sodales tempus. Proin et mi scelerisque tellus eleifend auctor. Sed sagittis erat sapien, in porttitor augue bibendum nec. Nam ut mi accumsan lorem volutpat tempus. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. In ac nulla tempor, pharetra felis id, venenatis tortor. Donec felis turpis, egestas non ligula at, eleifend fringilla est. Fusce elit mi, fermentum a sapien eleifend, rutrum scelerisque eros. Sed et vestibulum orci. Quisque ut magna vel nibh accumsan dictum eget eu urna. Duis rhoncus, lacus in imperdiet tincidunt, turpis turpis vestibulum ante, at mollis nisi massa et purus. Phasellus sed ante eros. Aenean consequat nisi non massa eleifend finibus. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce ultrices libero id metus consequat semper. Nam venenatis, est quis interdum commodo, nisl ex placerat diam, sed fringilla ex nisi sed sem. Pellentesque luctus orci id tellus dictum tristique. Integer molestie varius risus quis maximus. In id feugiat nulla, at scelerisque massa. Nulla neque diam, consequat ac orci laoreet, venenatis pharetra enim.Aenean rhoncus dapibus augue ac volutpat. Nullam laoreet, lorem quis fermentum scelerisque"}'
works correctly.
curl -X POST "http://localhost:8080/embedding" --data '{"content":"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam rhoncus mauris eget magna semper, ut varius arcu eleifend. Vestibulum quis justo eget ex pretium sollicitudin. Nam euismod orci vulputate erat sagittis, sed pulvinar ante varius. Proin in dui non eros sodales tempus. Proin et mi scelerisque tellus eleifend auctor. Sed sagittis erat sapien, in porttitor augue bibendum nec. Nam ut mi accumsan lorem volutpat tempus. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. In ac nulla tempor, pharetra felis id, venenatis tortor. Donec felis turpis, egestas non ligula at, eleifend fringilla est. Fusce elit mi, fermentum a sapien eleifend, rutrum scelerisque eros. Sed et vestibulum orci. Quisque ut magna vel nibh accumsan dictum eget eu urna. Duis rhoncus, lacus in imperdiet tincidunt, turpis turpis vestibulum ante, at mollis nisi massa et purus. Phasellus sed ante eros. Aenean consequat nisi non massa eleifend finibus. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce ultrices libero id metus consequat semper. Nam venenatis, est quis interdum commodo, nisl ex placerat diam, sed fringilla ex nisi sed sem. Pellentesque luctus orci id tellus dictum tristique. Integer molestie varius risus quis maximus. In id feugiat nulla, at scelerisque massa. Nulla neque diam, consequat ac orci laoreet, venenatis pharetra enim.Aenean rhoncus dapibus augue ac volutpat. Nullam laoreet, lorem quis fermentum scelerisque,"}'
fails (note one more character at the end, a comma - it could be anything).
Update 1: /v1/embeddings endpoint behaves in the same way.
Update 2: When embed is called from java-llama, it behaves similarly, works for the first sample input, for the second it fails with:
terminate called after throwing an instance of 'nlohmann::json_abi_v3_11_3::detail::type_error' what(): [json.exception.type_error.302] type must be array, but is null
Update 3: Up until b2356 the endpoint returns embeddings, though incorrect (all zeroes). From b2357 the bug seems present.
The text was updated successfully, but these errors were encountered: