Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large synonyms sets inconsistently return synonym results #108785

Open
kderusso opened this issue May 17, 2024 · 4 comments
Open

Large synonyms sets inconsistently return synonym results #108785

kderusso opened this issue May 17, 2024 · 4 comments
Labels
>bug :Search/Search Search-related issues that do not fall into other categories :SearchOrg/Relevance Label for the Search (solution/org) Relevance team

Comments

@kderusso
Copy link
Member

Elasticsearch Version

8.13

Installed Plugins

No response

Java Version

bundled

OS Version

Cloud

Problem Description

This bug was initially reported by a community member via our discuss forums.

Creating large (>= 15,000 synonyms) synonym sets provides intermittent inconsistent results. The synonyms API will return successful results and no Elasticsearch errors are logged. The synonyms API will also return the individual synonyms correctly. However the _analyze call shows that certain synonyms are not returned.

The actual synonyms that are not returned may change in different synonyms sets but if they return inconsistent results this behavior is permanent.

Updating the synonyms set, reloading analyzers and refreshing the index do not resolve this issue.

We should fix this so that all synonyms are analyzed correctly, and/or update our documentation with a max limit of the number of synonyms that are allowed in a synonyms set.

Steps to Reproduce

The following script was run in the Dev Console on an 8.13.3 cloud deployment. The value of 6000 works in this example (and any value above 6000 that I tested) but this may vary and you may need to try additional numbers if you reproduce.

NOTE: The create synonyms API is truncated to fit within size

synonyms_bug.txt

Logs (if relevant)

No response

@kderusso kderusso added >bug :Search/Search Search-related issues that do not fall into other categories :SearchOrg/Relevance Label for the Search (solution/org) Relevance team labels May 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-relevance (Team:Search - Relevance)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ent-search-eng (Team:SearchOrg)

@carlosdelest
Copy link
Member

This is most probably because of the hard limit of 10,000 synonyms that we have on analysis when searching for synonyms.

We can set a bigger limit on this and also be explicit on the maximum number of synonyms we allow on updating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories :SearchOrg/Relevance Label for the Search (solution/org) Relevance team
Projects
None yet
Development

No branches or pull requests

3 participants