-
Checked other resources
Commit to Help
Example Codeimport { LlamaCpp } from "@langchain/community/llms/llama_cpp";
const llamaPath = "../../tinyllama-1.1b-chat-v1.0.Q2_K.gguf"; //"../../llama-2-7b-chat.GGUF.q4_0.bin";
const model = new LlamaCpp({ modelPath: llamaPath, temperature: 0.7, maxTokens: 25, streaming: true });
//await model.invoke("You are helpful assistant. Respond to questions. Reply to greetings.");
const prompt = "Hello";
const stream = await model.stream(prompt);
for await (const chunk of stream) {
console.log(chunk);
} DescriptionI've tried many models but the problem is always the same - if I don't send questions but a simple "Hello", the responses are confusing and very long, sometimes it seems like it won't stop and I have to force it to close. System Infoplatform - mac - Apple M1 Max |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
It sounds like you're encountering an issue with the LlamaCpp model generating infinitely long and confusing responses, especially when using simple prompts like "Hello". This behavior might be related to how the streaming functionality is implemented in the LlamaCpp model within the LangChain JS framework. The Given that you're already specifying a To address this, you might consider implementing a manual timeout or abort mechanism similar to the Additionally, ensure that your environment, particularly the Apple M1 Max chip, is fully compatible with the If the issue persists, providing more details about the specific models you're using and any additional configuration options might help further diagnose and resolve the problem.
|
Beta Was this translation helpful? Give feedback.
It sounds like you're encountering an issue with the LlamaCpp model generating infinitely long and confusing responses, especially when using simple prompts like "Hello". This behavior might be related to how the streaming functionality is implemented in the LlamaCpp model within the LangChain JS framework.
The
stream
method you're using is designed to return a stream of responses in chunks, which should theoretically stop once the model has finished generating its response based on themaxTokens
parameter you've set. However, if the stream doesn't terminate as expected, it could lead to the infinite response issue you're experiencing.Given that you're already specifying a
maxTokens
lim…