Reaching model maximum context length of 8192 tokens despite using gpt-4-0125-preview #11889

nshern · 2024-03-13T09:34:45Z

nshern
Mar 13, 2024

I am creating my index with the following line:

Settings.llm = OpenAI(model="gpt-4-0125-preview", temperature=0.1)
Settings.embed_model = OpenAIEmbedding()
index = VectorStoreIndex(nodes=nodes)

So why am I getting the following error:

raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 11486 tokens (11486 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}

When it is specified in the OpenAI documentation that the gpt-4-025-preview has a context length of 128.000 tokens and not 8192 (which vanilla gpt-4 has)

Answered by nshern

Mar 13, 2024

The problem was with the embedding model.

View full answer

@nshern · 2024-03-13T09:34:56Z

dosubot[bot]
bot Mar 13, 2024

Hey @nshern! Great to see you back here - hope you're doing well and ready for another round of troubleshooting.

The error message you're seeing is coming from the OpenAI API, not from the LlamaIndex framework. It's indicating that the total number of tokens in your prompt exceeds the maximum context length that the model can handle.

The gpt-4-0125-preview model indeed has a maximum context length of 128,000 tokens according to the OpenAI documentation. However, the error message suggests that the API is treating it as if it has a maximum context length of 8192 tokens.

This discrepancy could be due to a few reasons:

Misconfiguration: Ensure that you're correctly setting the model to gpt-4-0125-preview in your OpenAI instance. If the model is not set correctly, the API might be defaulting to a model with a lower maximum context length.
API Limitations: The OpenAI API might have limitations or restrictions that are not reflected in the documentation. It's possible that the API is enforcing a lower maximum context length than what's stated in the documentation.
Token Count Miscalculation: The error message indicates that your prompt contains 11486 tokens. Ensure that this count is accurate. Remember that in GPT models, a "token" is not necessarily a word—it could be as short as one character or as long as one word. So, a piece of text could have more tokens than words.

To resolve this issue, you could try the following:

Reduce the size of your prompt: If possible, try to reduce the size of your prompt so that it falls within the maximum context length. This could be a quick fix if your prompt is significantly larger than what the model can handle.
Check your model configuration: Ensure that you're correctly setting the model to gpt-4-0125-preview in your OpenAI instance. If the model is not set correctly, the API might be defaulting to a model with a lower maximum context length.
Contact OpenAI Support: If you're sure that your configuration is correct and your prompt is within the stated maximum context length, it might be worth reaching out to OpenAI support. They could provide more insight into why you're seeing this error.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

10 replies

dosubot bot Mar 13, 2024

The error you're encountering regarding the maximum context length of the gpt-4-0125-preview model being 8192 tokens, despite OpenAI documentation indicating a context length of 128,000 tokens, is likely due to a misunderstanding or misconfiguration in how you're using the llama-index and openai libraries.

The provided context from the llama-index-integrations/embeddings/llama-index-embeddings-openai/llama_index/embeddings/openai/base.py file does not directly reference the gpt-4-0125-preview model or its token limits. Instead, it defines various embedding models and their configurations, including a maximum token limit for embeddings set to 2048 tokens in the EMBED_MAX_TOKEN_LIMIT variable.

To resolve the issue, consider the following steps:

Check Model Specification: Ensure you are specifying the correct model (gpt-4-0125-preview) in your code. The provided context does not include a direct reference to this model, so you might need to specify it explicitly when making API calls or configuring the model in your code.
Review OpenAI API Documentation: Double-check the OpenAI API documentation for the gpt-4-0125-preview model to confirm the maximum context length and any other model-specific configurations. Ensure that your usage aligns with the documented capabilities and limitations.
Update Library Versions: Ensure that both llama-index and openai libraries are up to date. It's possible that newer versions of these libraries or the OpenAI API itself have introduced changes affecting model capabilities, including context length limits.
Contact Support: If the discrepancy persists and you are unable to resolve it through documentation or code adjustments, consider reaching out to OpenAI support for clarification on the gpt-4-0125-preview model's capabilities and any potential issues with the libraries you are using.

Given the provided context, there's no direct reference or configuration related to the gpt-4-0125-preview model that would explain the discrepancy in context length limits. The steps above are general recommendations based on the information available.

Regarding the maximum context length that the LlamaIndex library can handle, it is determined by the context_window property in the _Settings class. This property is directly linked to the context_window attribute of the PromptHelper class. However, the actual maximum value that can be set for context_window is not specified in the provided context. Therefore, while the mechanism to set and get the context window size is present, the maximum allowable value for the context length is not defined in the given code snippet.

I hope this helps! If you have any other questions, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

nshern Mar 13, 2024
Author

@dosu-bot
this is the full traceback, does it help?

Parsing data...
Finished parsing data..
Creating nodes..
No storage
Creating index
Retrying llama_index.embeddings.openai.base.get_embeddings in 0.8876549250253384 seconds as it raised BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 11486 tokens (11486 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}.
Retrying llama_index.embeddings.openai.base.get_embeddings in 1.5227365251244558 seconds as it raised BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 11486 tokens (11486 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}.
Retrying llama_index.embeddings.openai.base.get_embeddings in 2.9353886010667747 seconds as it raised BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 11486 tokens (11486 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}.
Retrying llama_index.embeddings.openai.base.get_embeddings in 0.6956002244463635 seconds as it raised BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 11486 tokens (11486 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}.
Retrying llama_index.embeddings.openai.base.get_embeddings in 10.555732157904167 seconds as it raised BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 11486 tokens (11486 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}.
Traceback (most recent call last):
  File "/Users/nshern/Developer/ft_gpt/ft_gpt/etl/loader.py", line 104, in <module>
    loader.run()
  File "/Users/nshern/Developer/ft_gpt/ft_gpt/etl/loader.py", line 99, in run
    self._create_index()
  File "/Users/nshern/Developer/ft_gpt/ft_gpt/etl/loader.py", line 93, in _create_index
    index = VectorStoreIndex(nodes=nodes)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nshern/Developer/ft_gpt/.venv/lib/python3.11/site-packages/llama_index/core/indices/vector_store/base.py", line 74, in __init__
    super().__init__(
  File "/Users/nshern/Developer/ft_gpt/.venv/lib/python3.11/site-packages/llama_index/core/indices/base.py", line 94, in __init__
    index_struct = self.build_index_from_nodes(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nshern/Developer/ft_gpt/.venv/lib/python3.11/site-packages/llama_index/core/indices/vector_store/base.py", line 307, in build_index_from_nodes
    return self._build_index_from_nodes(nodes, **insert_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nshern/Developer/ft_gpt/.venv/lib/python3.11/site-packages/llama_index/core/indices/vector_store/base.py", line 279, in _build_index_from_nodes
    self._add_nodes_to_index(
  File "/Users/nshern/Developer/ft_gpt/.venv/lib/python3.11/site-packages/llama_index/core/indices/vector_store/base.py", line 232, in _add_nodes_to_index
    nodes_batch = self._get_node_with_embedding(nodes_batch, show_progress)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nshern/Developer/ft_gpt/.venv/lib/python3.11/site-packages/llama_index/core/indices/vector_store/base.py", line 140, in _get_node_with_embedding
    id_to_embed_map = embed_nodes(
                      ^^^^^^^^^^^^
  File "/Users/nshern/Developer/ft_gpt/.venv/lib/python3.11/site-packages/llama_index/core/indices/utils.py", line 138, in embed_nodes
    new_embeddings = embed_model.get_text_embedding_batch(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nshern/Developer/ft_gpt/.venv/lib/python3.11/site-packages/llama_index/core/base/embeddings/base.py", line 255, in get_text_embedding_batch
    embeddings = self._get_text_embeddings(cur_batch)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nshern/Developer/ft_gpt/.venv/lib/python3.11/site-packages/llama_index/embeddings/openai/base.py", line 419, in _get_text_embeddings
    return get_embeddings(
           ^^^^^^^^^^^^^^^
  File "/Users/nshern/Developer/ft_gpt/.venv/lib/python3.11/site-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/nshern/Developer/ft_gpt/.venv/lib/python3.11/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nshern/Developer/ft_gpt/.venv/lib/python3.11/site-packages/tenacity/__init__.py", line 325, in iter
    raise retry_exc.reraise()
          ^^^^^^^^^^^^^^^^^^^
  File "/Users/nshern/Developer/ft_gpt/.venv/lib/python3.11/site-packages/tenacity/__init__.py", line 158, in reraise
    raise self.last_attempt.result()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.7_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/Users/nshern/Developer/ft_gpt/.venv/lib/python3.11/site-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/Users/nshern/Developer/ft_gpt/.venv/lib/python3.11/site-packages/llama_index/embeddings/openai/base.py", line 180, in get_embeddings
    data = client.embeddings.create(input=list_of_text, model=engine, **kwargs).data
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nshern/Developer/ft_gpt/.venv/lib/python3.11/site-packages/openai/resources/embeddings.py", line 113, in create
    return self._post(
           ^^^^^^^^^^^
  File "/Users/nshern/Developer/ft_gpt/.venv/lib/python3.11/site-packages/openai/_base_client.py", line 1200, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nshern/Developer/ft_gpt/.venv/lib/python3.11/site-packages/openai/_base_client.py", line 889, in request
    return self._request(
           ^^^^^^^^^^^^^^
  File "/Users/nshern/Developer/ft_gpt/.venv/lib/python3.11/site-packages/openai/_base_client.py", line 980, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 11486 tokens (11486 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}

dosubot bot Mar 13, 2024

Based on the error message you're receiving, it seems that the issue is related to the maximum context length that the model can handle. The gpt-4-0125-preview model has a maximum context length of 128,000 tokens, but the error message indicates that the model's maximum context length is 8192 tokens. This discrepancy might be due to a limitation set within the llama-index library itself or a misconfiguration in how the model is being utilized through the library.

The llama-index library does not directly reference the gpt-4-0125-preview model or its token limit. Instead, it defines various models and their modes for embeddings, with a hardcoded maximum token limit for embeddings set to 2048 tokens in the get_embeddings and aget_embeddings functions. This limit is for the batch size of texts being processed for embeddings and not directly related to the token limit of the GPT model. However, it indicates that the library imposes its own constraints on input sizes, separate from those of the underlying models.

To resolve the issue, you might want to consider the following steps:

Check Library Documentation/Updates: Ensure that you're using the latest version of the llama-index library and check its documentation for any updates or configurations related to handling larger context sizes, especially for newer models like gpt-4-0125-preview.
Review Model Configuration: Verify that the model is correctly configured within your code to use the gpt-4-0125-preview model specifically. The provided code snippet does not show how the gpt-4-0125-preview model is set or used, so ensure that any model name or configuration passed to the OpenAI API is correct.
Library Limitations: If the llama-index library inherently does not support the larger context size of the gpt-4-0125-preview model due to its internal constraints or how it interfaces with the OpenAI API, you might need to directly use the openai library for operations requiring the full context length capability of the model. This approach bypasses any limitations or configurations within the llama-index library that might restrict context size.

In the LlamaIndex repository, there are similar issues reported by other users. One of the suggested solutions is to explicitly set the chunk_size_limit to a value that is within the maximum context length allowed by the model. This can be done when creating the ServiceContext as shown in the following code snippet:

service_context = ServiceContext.from_defaults(
    llm_predictor=llm_predictor,
    embed_model=embedding_llm,
    prompt_helper=prompt_helper,
    chunk_size_limit=1000
)

This solution has been reported to solve the error for at least one user. You can find more details in the discussion and issue in the LlamaIndex repository.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

nshern Mar 13, 2024
Author

The problem was with the embedding model.

Answer selected by nshern

christopherhan Mar 14, 2024

@nshern what was the fix for your embeddings? By default, OpenAIEmbedding() uses text-embedding-ada-002, but I'm using text-embedding-3-large like you have in your commented line

embed_model = OpenAIEmbedding(model="text-embedding-3-large", dimensions=3072)
llm = OpenAI(temperature=0.0, model="gpt-4-turbo-preview", max_tokens=128000)

SamuelJabes May 7, 2024

I'm having the same issue. I'm trying to create an Index with VectorStoreIndex from 2254 Nodes that were generated from my Documents, but whenever I pass the Nodes as an argument, it gives me this token limit error. I think the VectorStoreIndex will embed each Node, so is it possible that some Node has more than 8192 tokens?? How did you guys solve it?

I'm also using OpenAIEmbedding, with text-embedding-3-large with 3072 dimensions.

xdep · 2024-05-14T18:09:05Z

xdep
May 14, 2024

Facing the same issue here running on GPT4-Turbo, see the below error message that we face in our backend;

2024-05-14 20:01:06 Retrying llama_index.embeddings.openai.base.get_embedding in 6.3257633998151785 seconds as it raised BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 14607 tokens (14607 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}.

Anyone else found a solution or know's a workaround for this type of problem ?

0 replies

usama-dev · 2024-05-14T20:26:43Z

usama-dev
May 14, 2024

I'm facing the same issue with the TS library. Here's the setting I'm using:

Settings.llm = new OpenAI({ model: 'gpt-4-turbo-preview' });

Settings.embedModel = new OpenAIEmbedding({
	model: 'text-embedding-3-large',
});

I've tried with other gp4 models as well but no luck.

0 replies

richzw · 2024-05-16T10:07:28Z

richzw
May 16, 2024

I met the same issue: Reaching model maximum context length of 8192 tokens. The reason is that the chunk size is too large, so that over the limit of 8192 limit.

The solution is to reduce the size of chunk size.

def our_splitter() -> Callable[[str], List[str]]:
    splitter = SentenceSplitter(chunk_size=60, chunk_overlap=13)
    def split(text: str) -> List[str]:
        return splitter.split_text(text)
    return split

node_parser = SentenceWindowNodeParser.from_defaults(
    sentence_splitter=our_splitter(),
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reaching model maximum context length of 8192 tokens despite using gpt-4-0125-preview #11889

{{title}}

Replies: 4 comments 10 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

About Dosu

{{title}}

About Dosu

{{title}}

{{title}}

About Dosu

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Reaching model maximum context length of 8192 tokens despite using gpt-4-0125-preview #11889

nshern Mar 13, 2024

Replies: 4 comments · 10 replies

dosubot[bot] bot Mar 13, 2024

Sources

About Dosu

dosubot bot Mar 13, 2024

Sources

About Dosu

nshern Mar 13, 2024 Author

dosubot bot Mar 13, 2024

Sources

About Dosu

nshern Mar 13, 2024 Author

christopherhan Mar 14, 2024

SamuelJabes May 7, 2024

xdep May 14, 2024

usama-dev May 14, 2024

richzw May 16, 2024

nshern
Mar 13, 2024

Replies: 4 comments 10 replies

dosubot[bot]
bot Mar 13, 2024

nshern Mar 13, 2024
Author

nshern Mar 13, 2024
Author

xdep
May 14, 2024

usama-dev
May 14, 2024

richzw
May 16, 2024