Apache Spark
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Here are 8,335 public repositories matching this topic...
YTsaurus is a scalable and fault-tolerant open-source big data platform.
-
Updated
Jun 12, 2024 - C++
The MongoDB Spark Connector
-
Updated
Jun 12, 2024 - Java
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
-
Updated
Jun 12, 2024 - Scala
A large-scale entity and relation database supporting aggregation of properties
-
Updated
Jun 12, 2024 - Java
REST API for Apache Spark on K8S or YARN
-
Updated
Jun 12, 2024 - Java
Simple and Distributed Machine Learning
-
Updated
Jun 12, 2024 - Scala
High Performance HTTP Sidecar Load Balancer
-
Updated
Jun 12, 2024 - Go
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
-
Updated
Jun 12, 2024 - Scala
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
-
Updated
Jun 12, 2024 - Scala
PawMark is a platform for developers to build, schedule and monitor data pipelines.
-
Updated
Jun 12, 2024 - JavaScript
cube studio开源云原生一站式机器学习/深度学习/大模型AI平台,支持sso登录,多租户,大数据平台对接,notebook在线开发,拖拉拽任务流pipeline编排,多机多卡分布式训练,超参搜索,推理服务VGPU,边缘计算,serverless,标注平台,自动化标注,数据集管理,大模型微调,vllm大模型推理,llmops,私有知识库,AI模型应用商店,支持模型一键开发/推理/微调,支持国产cpu/gpu/npu芯片,支持RDMA,支持pytorch/tf/mxnet/deepspeed/paddle/colossalai/horovod/spark/ray/volcano分布式
-
Updated
Jun 12, 2024 - Jupyter Notebook
Created by Matei Zaharia
Released May 26, 2014
- Followers
- 417 followers
- Repository
- apache/spark
- Website
- spark.apache.org
- Wikipedia
- Wikipedia