How to stop generation when specific tokens are encountered in server mode? #7322

noshila · 2024-05-16T10:12:26Z

noshila
May 16, 2024

Hi, there is a reverse prompt option that can be used with -r flag with main, however, I could not find anything like that for server. Is there a way to stop generation when specific tokens are encountered?

Actually, I'm having trouble with Phi-3-mini-4k-instruct-q4.gguf which was quantized and made available on huggingface by MS. If I don't write <|assistant|> at the end of text that is sent to the server where it is running then it would just keep going on and on until token generation limit is reached or <|endoftext|> is produced. It aslo outputs ### instruction: and then starts a new discussion or just write question for itself and then tries to answer it and sometimes it produces tokens like user: or <|user|> or <|assistant|> . And sometimes even if I write <|assistant|> token in the start it would do these things. So, how to fix this? If you need more context please ask and I will either copy paste output here or answer question for further clarification.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to stop generation when specific tokens are encountered in server mode? #7322

{{title}}

Replies: 0 comments

Select a reply

How to stop generation when specific tokens are encountered in server mode? #7322

noshila May 16, 2024

Replies: 0 comments

noshila
May 16, 2024