RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
-
Updated
Jun 12, 2024 - Python
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
An orchestration platform for the development, production, and observation of data assets.
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift) in real-time.
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
The ultimate open-source RAG framework
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Main repo including core data model, data marts, reference data, terminology, and the clinical concept library
Cloud native open-source end-to-end data / AI / ML platform
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
Move your data with ease.
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Smart Automation Tool for building modern Data Lakes and Data Pipelines
Bruin is a data pipeline tool that is designed to be easy-to-use. It allows building data pipelines using SQL and Python, and has built-in data quality checks.
Lean and mean distributed stream processing system written in rust and web assembly.
Best practices for data workflows, integrations with the Modern Data Stack (MDS), Infrastructure as Code (IaC), Cloud Provider Services
Low-code ETL for structured and unstructured data. Generates Python code you can deploy anywhere.
Add a description, image, and links to the data-pipelines topic page so that developers can more easily learn about it.
To associate your repository with the data-pipelines topic, visit your repo's landing page and select "manage topics."