multimodal

Star

Here are 684 public repositories matching this topic...

rerun-io / rerun

Star

Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.

visualization python rust computer-vision cpp robotics multimodal

Updated Jun 11, 2024
Rust

Yangyi-Chen / Multimodal-AND-Large-Language-Models

Star

Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.

machine-learning multimodal large-language-models general-purpose-model

Updated Jun 11, 2024

NVIDIA / NeMo

Star

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

machine-translation tts speech-synthesis neural-networks deeplearning speaker-recognition asr multimodal speech-translation large-language-models speaker-diariazation generative-ai

Updated Jun 11, 2024
Python

livekit / agents

Star

Build real-time multimodal AI applications 🤖🎙️📹

real-time video ai voice agents voice-assistant multimodal

Updated Jun 11, 2024
Python

Super-Badmen-Viper / NSMusicS

Star

NSMusicS，Multi platform Multi mode Music Software ，Electron(Vue3+Vite+TypeScript)+.net core+AI

electron audio javascript python music player machine-learning typescript music-player deep-learning csharp natural-language pytorch audio-player netcore net librosa multimodal vue3

Updated Jun 11, 2024
Vue

songqiang321 / Awesome-AI-Papers

Star

This repository is used to collect papers and code in the field of AI.

Updated Jun 11, 2024

aws-samples / improve-employee-productivity-using-genai

Star

Employee Productivity GenAI Assistant Example is an innovative code sample and architecture pattern designed to enhance writing tasks efficiency using AWS serverless technologies and Amazon Bedrock's generative AI models.

aws aws-lambda aws-s3 aws-apigateway aws-serverless aws-dynamodb aws-sam multimodal servereless aws-cloud9 generative-ai anthropic-claude genai aws-bedrock

Updated Jun 11, 2024
JavaScript

RLHF-V / RLAIF-V

Star

RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

chatbot multimodal llava vision-language-learning gpt-4v llava-next rlaif-v minicpm-v

Updated Jun 11, 2024
Python

modelscope / swift

Star

ms-swift: Use PEFT or Full-parameter to finetune 250+ LLMs or 35+ MLLMs. (Qwen2, GLM4, Internlm2, Yi, Llama3, Llava, Deepseek, Baichuan2...)

Updated Jun 11, 2024
Python

dusty-nv / NanoLLM

Star

Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.

speech multimodal rag edge-ai vector-database vision-transformer llm-inference

Updated Jun 11, 2024
Python

autodistill / autodistill

Star

Images to inference with no labeling (use foundation models to train supervised models).

machine-learning computer-vision deep-learning image-annotation pytorch image-classification object-detection instance-segmentation labeling-tool multimodal yolov5 model-distillation foundation-models auto-labeling yolov8 segment-anything grounding-dino

Updated Jun 11, 2024
Python

Yuan-ManX / ai-multimodal-timeline

Star

Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Agent, Audio, Image, Video, Music and 3D content. 🔥

ai multi-modal ai-agents deeplearning-ai multimodal multimodal-deep-learning llm

Updated Jun 11, 2024

manishkumart / Super-Rapid-Annotator-Multimodal-Annotation-Tool

Star

This repository is part of the GSoC '24 project and demonstrates video annotation capabilities through the integration of a multimodal vision and language model with spatiotemporal analysis.

nlp vlm multimodal llm

Updated Jun 11, 2024

isLinXu / paper-list

Star

autoupdate paper list

reinforcement-learning classification image-generation object-detection transfer-learning optical-flow object-tracking semantic-segmentation action-recognition audio-processing pose-estimation depth-estimation anomaly-detection multimodal scene-understanding graph-neural-networks llm

Updated Jun 11, 2024
Python

bentoml / BentoML

Star

The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated Jun 11, 2024
Python

EliaFantini / 2D-Priors-for-3D-human-reconstruction

Star

Potential of 2D Priors for Improving Robustness of Ill-Posed 3D Reconstruction

python deep-learning pytorch clip 3d-reconstruction robustness multimodal self-supervised-learning single-image pifu robustness-analysis human-mesh-reconstruction

Updated Jun 11, 2024
Python

enricoros / big-AGI

Sponsor

Star

Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.