Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
-
Updated
Jun 11, 2024 - Rust
Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Build real-time multimodal AI applications 🤖🎙️📹
NSMusicS,Multi platform Multi mode Music Software ,Electron(Vue3+Vite+TypeScript)+.net core+AI
This repository is used to collect papers and code in the field of AI.
Employee Productivity GenAI Assistant Example is an innovative code sample and architecture pattern designed to enhance writing tasks efficiency using AWS serverless technologies and Amazon Bedrock's generative AI models.
RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness
Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
Images to inference with no labeling (use foundation models to train supervised models).
Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Agent, Audio, Image, Video, Music and 3D content. 🔥
This repository is part of the GSoC '24 project and demonstrates video annotation capabilities through the integration of a multimodal vision and language model with spatiotemporal analysis.
autoupdate paper list
The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!
Potential of 2D Priors for Improving Robustness of Ill-Posed 3D Reconstruction
Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance
Add a description, image, and links to the multimodal topic page so that developers can more easily learn about it.
To associate your repository with the multimodal topic, visit your repo's landing page and select "manage topics."