Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions on implementing multi-scale quantization #3402

Open
4 tasks
anfatima opened this issue Apr 30, 2024 · 3 comments
Open
4 tasks

Suggestions on implementing multi-scale quantization #3402

anfatima opened this issue Apr 30, 2024 · 3 comments
Labels

Comments

@anfatima
Copy link

Summary

Is multiscale quantization (https://papers.nips.cc/paper_files/paper/2017/hash/b6617980ce90f637e68c3ebe8b9be745-Abstract.html) supported? I have been reading the FAISS code, but so far it seems like that it is not supported and there doesn't seem to be a straightforward way to code it in Python without affecting performance significantly.

Any suggestions on the fastest way to add support for it (if it is not supported)? Are there alternative solutions that deal with the problem of large variances in the norms of the data point? If it is not supported, why not?

Platform

Faiss version: 1.7.4

Running on:

  • CPU
  • GPU

Interface:

  • C++
  • Python
@mdouze mdouze added the question label May 6, 2024
@mdouze
Copy link
Contributor

mdouze commented May 6, 2024

Yes it would be interesting to try it out.
What's weird in the paper is that the experiments are performed on Deep1M and SIFT1M, that are both normalized datasets, so the justification of multiscale quantization is not convincing.

@anfatima
Copy link
Author

anfatima commented May 7, 2024

They say that a large variance between the norms affects the retrieval performance, so it is more of an argument about how large variances degrade the codebook performance.

Also, another paper on assessing the performance of compressed embeddings (https://arxiv.org/pdf/1909.01264) advocates for uniform quantization compared to K-means as a coarse quantizer. So, I was wondering if uniformly quantizing the scalar component (in multiscale) and then using that bucket to get the quantized residual could lead to better retrieval.

I will test it out with what is currently available in FAISS and if it leads to improvement in retrieval, will open a thread on how to implement it in FAISS for better runtime performance.

Thanks!

@mdouze
Copy link
Contributor

mdouze commented May 10, 2024

Sure. NB that many clustering variants can be implemented in python without much performance impact, see eg. the k-means implementation in

https://github.com/facebookresearch/faiss/blob/main/contrib/clustering.py#L330

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants