Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: NN Vector search and similarity masking (partial vector search) #4067

Open
2 tasks done
orimay opened this issue May 19, 2024 · 0 comments
Open
2 tasks done
Labels
feature New feature or request triage This issue is new

Comments

@orimay
Copy link

orimay commented May 19, 2024

Is your feature request related to a problem?

I would like to be able to mask search vector, so that the masked points were not affecting similarity nor NN search. This would've been an amazing feature, allowing for complex dynamic search by some or many, but not all fields.

I am having a table with points:

DELETE FROM test;

DEFINE INDEX test_point ON test FIELDS point MTREE DIMENSION 3 DIST COSINE;

INSERT INTO test [
	{ id: test:0, point: [ 0f, 0f, 0f ] },
	{ id: test:1, point: [ 0f, 1f, 0f ] },
	{ id: test:2, point: [ 0f, 1f, 1f ] },
	{ id: test:3, point: [ 0f, 0f, 1f ] },
	{ id: test:4, point: [ 1f, 0f, 0f ] },
	{ id: test:5, point: [ 1f, 1f, 0f ] },
	{ id: test:6, point: [ 1f, 1f, 1f ] },
	{ id: test:7, point: [ 1f, 0f, 1f ] },
];

and I would like to query it by only two components, ignoring, let's say, the middle one. Currently, it's not possible

Describe the solution

I would like to be able to query it like this:

LET $pt = [ 0f, NONE, 1f ];

SELECT
    id,
    point,
    vector::similarity::cosine(point, $pt) AS similarity
FROM
    test
ORDER BY
    similarity DESC PARALLEL
;

This way, the second component won't influence similarity nor search as if it wasn't existing. Here's what I'd like to get:

[
	{
		id: test:0,
		point: [ 0, 0, 0 ],
		similarity: NaN
	},
	{
		id: test:1,
		point: [ 0, 1, 0 ],
		similarity: NaN
	},
	{
		id: test:2,
		point: [ 0, 1, 1 ],
		similarity: 1
	},
	{
		id: test:3,
		point: [ 0, 0, 1 ],
		similarity: 1
	},
	{
		id: test:6,
		point: [ 1, 1, 1 ],
		similarity: 0.7071067811865475f
	},
	{
		id: test:7,
		point: [ 1, 0, 1 ],
		similarity: 0.7071067811865475f
	},
	{
		id: test:4,
		point: [ 1, 0, 0 ],
		similarity: 0
	},
	{
		id: test:5,
		point: [ 1, 1, 0 ],
		similarity: 0
	}
]

Alternative methods

It's possible to use manual SELECT with WHERE condition, but it doesn't scale well.

SurrealDB version

https://surrealist.app/query

Contact Details

dmitrii.a.baranov@gmail.com

Is there an existing issue for this?

  • I have searched the existing issues

Code of Conduct

  • I agree to follow this project's Code of Conduct
@orimay orimay added feature New feature or request triage This issue is new labels May 19, 2024
@orimay orimay changed the title Feature: NN Vector search and similarity masking Feature: NN Vector search and similarity masking (partial vector search) May 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request triage This issue is new
Projects
None yet
Development

No branches or pull requests

1 participant