Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lower VRAM usage by only having one model loaded at a time #46

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

bolshoytoster
Copy link

This fork changes Interrogator to only load BLIP to VRAM on init, and leave CLIP in RAM until it's needed.

When interrogate is first called, it does BLIP inference, unloads it, loads CLIP, then does CLIP inference. 'Unloaded' in this case just means 'in RAM'.

Using this, I can run classic/fast interrogation on 4GB of VRAM, 'best' is still a little too big however.

This could cause performance issues when processing a large number of images, so I also added interrogate_batch and corresponding interrogate_{classic,fast}_batch. There was no need to add a batch function for negative interrogation since it only uses CLIP.

This also includes automatic black formatting and extra type hints, which can be removed if you want.

I'd like to contribute my changes, since this makes clip-interrogator more accessible.

It could maybe be disabled by default due to slight performance issues for code that doesn't use one of the batch functions and could just be enabled through Config.

This changes `Interrogator` to only load BLIP to VRAM on init, and leave CLIP in
RAM until it's needed.

When `interrogate` is first called, it does BLIP inference, unloads it, loads
CLIP, then does CLIP inference. 'Unloaded' in this case just means 'in RAM'.

Using this, I can run classic/fast interrogation on 4GB of VRAM, 'best' is still
a little too big however.

This commit also includes automatic `black` formatting and extra type hints,
which can be removed if you want.
@pharmapsychotic
Copy link
Owner

Thanks bolshoytoster! That's really cool idea to do full pass over batch with BLIP then second pass over it with CLIP! I'm looking at VRAM usage right now

@bolshoytoster
Copy link
Author

bolshoytoster commented Feb 19, 2023

I also found another way to reduce VRAM usage to <4 GB on the standard interrogation, but it reduces accuracy and involves editing the open_clip package. I'll try to find a way to avoid that or open a PR to there to allow it.

Edit: Nevermind, my latest commit fixes a bug that meant it was using float32 instead of float16 while using GPU, it currently runs on my 4GB GPU, but takes ~10 minutes.

bolshoytoster and others added 2 commits February 19, 2023 12:28
In `load_clip_model`, it used to check whether a GPU is being used by checking
if `config.device` == "cuda". This is fine, assuming all users will pass a str
for the device. Unfortunately, many users (including the `run_{cli,gradio}.py`
scripts instead pass a `torch.device`, and `torch.device("cuda") != "cuda"`

This commit makes it compare the `device.type` instead, which will be a string,
making this condition pass, and uses float16 when possible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants