You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So, now that Google has released Paligemma (which is SigLip, as opposed to CLIP-based) what would it take to support it similarly to Gemma, and LLaVA? I will be benching it against both gemma-2b (on text tasks) and 7b llava (on vision tasks) soon enough to get some idea where it sits, but God it's annoying to get transformers working on macOS, and reliably, too. Llama.cpp support would bring additional advantage of bringing it to Android/iOS where Llava is quite prohibitive.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
https://huggingface.co/blog/paligemma
So, now that Google has released Paligemma (which is SigLip, as opposed to CLIP-based) what would it take to support it similarly to Gemma, and LLaVA? I will be benching it against both gemma-2b (on text tasks) and 7b llava (on vision tasks) soon enough to get some idea where it sits, but God it's annoying to get transformers working on macOS, and reliably, too. Llama.cpp support would bring additional advantage of bringing it to Android/iOS where Llava is quite prohibitive.
Beta Was this translation helpful? Give feedback.
All reactions