Implementation of Qformer from BLIP2 in Zeta Lego blocks.
-
Updated
May 17, 2024 - Python
Implementation of Qformer from BLIP2 in Zeta Lego blocks.
The open source implementation of the model from "Scaling Vision Transformers to 22 Billion Parameters"
Limit the use of end-to-end data for Speech Translation (by leveraging Automatic Speech Recognition and Machine Translation data instead) using zero-shot multilingual text translation techniques.
This repository provides a streamlit application that enables a user to upload a screenshot which will than be queried against a database of PDF documents. Both the image structure as well as the (possibly) included text are used to find matching documents for a self defined set.
Code for ASGEA: Exploiting Logic Rules from Align-Subgraphs for Entity Alignment
Repository for air water and land surveillance robot developed as a part of DRDO Robotics and Unmanned Systems Exposition.
Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta
Omni-Modality Processing, Understanding, and Generation
This library provides packages on DoubleML / Causal Machine Learning and Neural Networks in Python for Simulation and Case Studies.
Kedro pipelines for multimodal ML in TensorFlow.
A Knowledge Network implementation from Knowledge Graphs
Implementation of M2PT in PyTorch from the paper: "Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities"
Multi-Modal Image Generation for News Stories
This is a warehouse for CoCa-pytorch-model, can be used to train your dataset
This repo builds on MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations paper, comparing the accuracies of shallow machine learning models with deep LSTM models using bi-modal approach (text + audio).
Fast Tensors Packaging library for text, image, video, and audio data compatible with PyTorch, TensorFlow, & NumPy 🖼️🎵🎥 ➡️ 🧠
An image retrieval system which can find the matching image from thousands of images when giving a short description.
Implementation of the model "Hedgehog" from the paper: "The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry"
Selected code from my M.Sc. thesis: Cyber-resilient multi-modal sensor fusion for autonomous ship navigation
A practice to handle multi-modal datasets in a unified way.
Add a description, image, and links to the multi-modal topic page so that developers can more easily learn about it.
To associate your repository with the multi-modal topic, visit your repo's landing page and select "manage topics."