You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, there is a reverse prompt option that can be used with -r flag with main, however, I could not find anything like that for server. Is there a way to stop generation when specific tokens are encountered?
Actually, I'm having trouble with Phi-3-mini-4k-instruct-q4.gguf which was quantized and made available on huggingface by MS. If I don't write <|assistant|> at the end of text that is sent to the server where it is running then it would just keep going on and on until token generation limit is reached or <|endoftext|> is produced. It aslo outputs ### instruction: and then starts a new discussion or just write question for itself and then tries to answer it and sometimes it produces tokens like user: or <|user|> or <|assistant|> . And sometimes even if I write <|assistant|> token in the start it would do these things. So, how to fix this? If you need more context please ask and I will either copy paste output here or answer question for further clarification.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi, there is a reverse prompt option that can be used with -r flag with main, however, I could not find anything like that for server. Is there a way to stop generation when specific tokens are encountered?
Actually, I'm having trouble with Phi-3-mini-4k-instruct-q4.gguf which was quantized and made available on huggingface by MS. If I don't write
<|assistant|>
at the end of text that is sent to the server where it is running then it would just keep going on and on until token generation limit is reached or <|endoftext|> is produced. It aslo outputs### instruction:
and then starts a new discussion or just write question for itself and then tries to answer it and sometimes it produces tokens likeuser:
or<|user|> or <|assistant|>
. And sometimes even if I write<|assistant|>
token in the start it would do these things. So, how to fix this? If you need more context please ask and I will either copy paste output here or answer question for further clarification.Beta Was this translation helpful? Give feedback.
All reactions