-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document boosting #4189
Comments
Sorry, not possible for v1.6.0, other priorities have been done. |
Is this something that will be similar to 'Function scoring' in ElasticSearch[1]? If not, I think you should consider it. It's a powerful primitive that solves many types of problems in search:
To give some concrete use-cases of the above:
It's one of those foundational features that will solve a lot of "long-tail problems". When you are planning the roadmap, sometimes it's better to build one slightly larger feature that solves 10+ "smaller requests", than building 10 smaller features to solve individual problems. [1] https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html |
Hello @sandstrom,
It is not the same. Meilisearch isn't a score-based search engine but rather a bucket-sort-based one. It doesn't rely on a global score by document to determine the order of the document but rather on buckets spilling into other smaller buckets. This algorithm is helpful when each ranking rule has a different level of importance. However, thanks to the Score Details feature, Meilisearch can output a global ranking score as a simple number. Our Hybrid Search system uses it to rank documents using the semantic similarity score (a number) and the keyword search score.
It is already possible to Rank nearby restaurants/hotels/etc higher with the Geo Sort feature. You can already sort documents by asc/desc creation date with the Search-side Sort or Ranking Rule Sort features. Regarding the Reddit-style tradeoff, recency vs. many votes I agree that it is currently impossible without updating the documents regularly. However, even with the Elasticsearch Function Scoring feature, the document score depends on an external About the Leniency in price-range queries I understand the use case of showing documents that are outside of a filtered range but not too far from the original range given. However, wouldn't it be possible to increase the range slightly? This way, it shows more documents, and you decide on the distance gap. There is currently no way to do conditional filtering, and this feature will evolve that way. We first want to be able to apply filtering based on the query content: add specific documents in certain conditions...
Thank you for your insights and have a great day 🍥 |
Thanks @Kerollmops for an extensive answer! To be more specific, the main thing we are missing is some kind of decay functionality (in our case based on time, but could be a numerical value for others). We currently use Elastic Search Decay Function Scores. See query below. Having to run a scheduled job every day, to calculate e.g. There also isn't an atomic But maybe there is an inherent restriction in the bucket vs. score-based engine, that makes this very difficult or impossible? // if you are curious, this is the ElasticSearch query we are using today
query_body = {
'query' => {
'bool' => {
'must' => [{
'function_score' => {
'query' => {
'multi_match' => {
'query' => query,
'type' => 'cross_fields',
'fields' => ['title', 'reference', 'user'],
'operator' => 'and',
},
},
'functions' => [{
# reduce score for older reports with exponential decay
'exp' => {
'submitted_at' => {
'origin' => Time.now,
'offset' => '30d', # no score reduction (i.e. modifier = 1.0) within 30 days
'scale' => '730d', # 2 year falloff until `decay` value
'decay' => 0.5, # modifier at `offset`+`scale`, i.e. 2y+30d old => modifier = 0.5
},
},
}],
'score_mode' => 'multiply', # multiply score modifier values for final score
},
}],
'filter' => {
'bool' => {
'must' => [
{ 'term' => { 'company_id' => company.id } }, # scope to company
{ 'bool' => { 'must_not' => { 'exists' => { 'field' => 'archived_at' } } } }, # exclude archived
],
},
},
},
},
} |
Indeed, there is no {
"filter": "category = gaming", // only update this subset
// inserts the new score and scoreUpdatedAt fields with the given computation in the filtered documents
"formulas": {
// Upvotes and timestamp are fields already in the documents
"score": "upvotes / ((1709139095 - timestamp) * 20.0)",
"scoreUpdatedAt": "1709139095"
}
}
No, I don't think so. You can define the right ranking rules to sort your documents and change the numeric field value accordingly. Nothing is inherently impossible or hard to do. Have a great day 🍡 |
@Kerollmops Yes, it would! As long as it isn't too expensive to run (we'd be running it on all docs in an index with ~1M rows and growing), this would do the job! There is a slight win in having scores computed on-the-fly, in the sense that we'll only need to update a query to change the behavior (developer + ops ergonomics). But in the grander scheme of things, that's a small thing. We could easily setup a scheduled job that does this daily or weekly. As an aside, I've followed MeiliSearch since ~2001 -- wrote about this back then. It's a great project and we would love to use Meili! This has been the only remaining blocker for a while now, that keeps us from dropping ElasticSearch. |
Is there an eta on the implementation of this feature? |
Hi @pepijn-vanvlaanderen , we had to deprioritize the feature in favor of more urgent work so don't have an ETA yet. You're welcome to add your use case and vote to the Function Scoring discussion if it aligns with your needs, or you can also start a new discussion. |
This is an important feature for our organization as well. Without it, our search results become less and less relevant. Our website is built on Laravel and we love that it is compatible with Meilisearch. Meilisearch lets us store our search engine's database locally, due to our privacy & policy, we can't store it on the cloud. Anyway, we have so-called publication documents and for our search results to stay relevant we need to introduce time decay on the following parameters:
With the current filtering and sorting mechanism it is just simply not achievable. If we manage to get newer results, then they are less relevant query-wise, if we manage to get more relevant results then most of them are old. Sadly, if this functionality is not something that will be prioritized in the near future, we will need to think of other alternatives. Thank you for your time reading this. |
Thank you @Tadaz for your feedback, I informed the product team 👌 |
Hey @sandstrom and @Tadaz 👋 I just released a first prototype of a way to edit documents by using a Rhai function. You can read more on the Public Usage page. Please, tell me more about what you think. The documentation may be improved but it's only the first prototype. Have a nice day 🍾 |
Sounds interesting! However the docs link shows a login screen. Still locked? |
Sorry @sandstrom, fixed it 🤭 |
@Kerollmops Looks great! Using Rhai seems like a very good idea. I've only read the docs (not tried it), but this should do the trick. In away on a short weekend trip so I cannot easily put together a quick test on my computer right now, but I'll try to get someone on the team to evaluate this approach and hopefully switch over to meili! |
Related product team resources: PRD (internal only)
Related spec: WIP
Motivation
Add Promoting and boosting features to cater for e-commerce use cases and close the competitor gap.
Usage
Refer to: https://www.notion.so/meilisearch/Document-boosting-API-usage-20aae06bc85e41dba828a90331a69f2c
TODO
main
Impacted teams
@meilisearch/docs-team @meilisearch/integration-team
The text was updated successfully, but these errors were encountered: