Lower VRAM usage by only having one model loaded at a time #46

bolshoytoster · 2023-02-12T17:08:26Z

This fork changes Interrogator to only load BLIP to VRAM on init, and leave CLIP in RAM until it's needed.

When interrogate is first called, it does BLIP inference, unloads it, loads CLIP, then does CLIP inference. 'Unloaded' in this case just means 'in RAM'.

Using this, I can run classic/fast interrogation on 4GB of VRAM, 'best' is still a little too big however.

This could cause performance issues when processing a large number of images, so I also added interrogate_batch and corresponding interrogate_{classic,fast}_batch. There was no need to add a batch function for negative interrogation since it only uses CLIP.

This also includes automatic black formatting and extra type hints, which can be removed if you want.

I'd like to contribute my changes, since this makes clip-interrogator more accessible.

It could maybe be disabled by default due to slight performance issues for code that doesn't use one of the batch functions and could just be enabled through Config.

This changes `Interrogator` to only load BLIP to VRAM on init, and leave CLIP in RAM until it's needed. When `interrogate` is first called, it does BLIP inference, unloads it, loads CLIP, then does CLIP inference. 'Unloaded' in this case just means 'in RAM'. Using this, I can run classic/fast interrogation on 4GB of VRAM, 'best' is still a little too big however. This commit also includes automatic `black` formatting and extra type hints, which can be removed if you want.

pharmapsychotic · 2023-02-18T23:29:11Z

Thanks bolshoytoster! That's really cool idea to do full pass over batch with BLIP then second pass over it with CLIP! I'm looking at VRAM usage right now

bolshoytoster · 2023-02-19T11:19:23Z

I also found another way to reduce VRAM usage to <4 GB on the standard interrogation, but it reduces accuracy and involves editing the open_clip package. I'll try to find a way to avoid that or open a PR to there to allow it.

Edit: Nevermind, my latest commit fixes a bug that meant it was using float32 instead of float16 while using GPU, it currently runs on my 4GB GPU, but takes ~10 minutes.

In `load_clip_model`, it used to check whether a GPU is being used by checking if `config.device` == "cuda". This is fine, assuming all users will pass a str for the device. Unfortunately, many users (including the `run_{cli,gradio}.py` scripts instead pass a `torch.device`, and `torch.device("cuda") != "cuda"` This commit makes it compare the `device.type` instead, which will be a string, making this condition pass, and uses float16 when possible.

bolshoytoster added 2 commits February 12, 2023 16:47

Minor bug fix

68460a1

bolshoytoster mentioned this pull request Feb 12, 2023

Optimisations for 8GB VRAM? #3

Open

bolshoytoster and others added 2 commits February 19, 2023 12:28

Merge branch 'main' into main

96fc82f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lower VRAM usage by only having one model loaded at a time #46

Lower VRAM usage by only having one model loaded at a time #46

bolshoytoster commented Feb 12, 2023

pharmapsychotic commented Feb 18, 2023

bolshoytoster commented Feb 19, 2023 •

edited

Lower VRAM usage by only having one model loaded at a time #46

Are you sure you want to change the base?

Lower VRAM usage by only having one model loaded at a time #46

Conversation

bolshoytoster commented Feb 12, 2023

pharmapsychotic commented Feb 18, 2023

bolshoytoster commented Feb 19, 2023 • edited

bolshoytoster commented Feb 19, 2023 •

edited