#

Apache Spark

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Here are 8,335 public repositories matching this topic...

minio / sidekick

High Performance HTTP Sidecar Load Balancer

kubernetes spark proxy bigdata load-balancer sidecar sidekick minio-servers

Updated Jun 12, 2024
Go

apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

bigquery real-time sql database spark hive hadoop etl snowflake olap query-engine redshift dbt elt iceberg hudi delta-lake lakehouse

Updated Jun 12, 2024
Java

apache / incubator-uniffle

Uniffle is a high performance, general purpose Remote Shuffle Service.

rss spark mapreduce shuffle tez remote-shuffle-service

Updated Jun 12, 2024
Java

apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

kubernetes sql spark hive hadoop jdbc thrift data-lake hacktoberfest spark-sql

Updated Jun 12, 2024
Scala

delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

big-data spark analytics acid delta-lake

Updated Jun 12, 2024
Scala

tobymao / sqlglot

Python SQL Parser and Transpiler

Updated Jun 12, 2024
Python

xuwenyihust / PawMark

PawMark is a platform for developers to build, schedule and monitor data pipelines.

kubernetes workflow spark jupyter-notebook gcp orchestration data-engineering data-platform mlflow delta-lake

Updated Jun 12, 2024
JavaScript

cube-studio

tencentmusic / cube-studio

cube studio开源云原生一站式机器学习/深度学习/大模型AI平台，支持sso登录，多租户，大数据平台对接，notebook在线开发，拖拉拽任务流pipeline编排，多机多卡分布式训练，超参搜索，推理服务VGPU，边缘计算，serverless，标注平台，自动化标注，数据集管理，大模型微调，vllm大模型推理，llmops，私有知识库，AI模型应用商店，支持模型一键开发/推理/微调，支持国产cpu/gpu/npu芯片，支持RDMA，支持pytorch/tf/mxnet/deepspeed/paddle/colossalai/horovod/spark/ray/volcano分布式

kubernetes workflow ai spark pipeline notebook inference pytorch argo gpt automl kubeflow mlops vgpu aihub llmops

Updated Jun 12, 2024
Jupyter Notebook

apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.

big-data spark flink real-time-analytics data-ingestion table-store paimon streaming-datalake

Updated Jun 12, 2024
Java

NVIDIA / spark-rapids

Spark RAPIDS plugin - accelerate Apache Spark with GPUs

big-data spark gpu rapids

Updated Jun 12, 2024
Scala

apache / spark

Apache Spark - A unified analytics engine for large-scale data processing

python java r scala sql big-data spark jdbc

Updated Jun 12, 2024
Scala

nessie

projectnessie / nessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

git java data spark aws-lambda iceberg

Updated Jun 12, 2024
Java

LLM-Red-Team / spark-free-api

🚀 讯飞星火大模型逆向API白嫖测试【特长：办公助手】，支持高速流式输出、智能体对话、联网搜索、AI绘图、长文档解读、图像解析、多轮对话，零配置部署，多路token支持，自动清理会话痕迹。

spark chatbot chat-api iflytek spark-ai llm chatgpt-api

Updated Jun 12, 2024
TypeScript

apache / incubator-graphar

An open source, standard data file format for graph data storage and retrieval.

big-data spark etl graph pyspark graph-analysis data-orchestration graph-storage

Updated Jun 12, 2024
C++

apache / celeborn

Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.

spark bigdata shuffle

Updated Jun 12, 2024
Java

AlexRogalskiy / spark-patterns

🏆 Spark4You Design patterns

patterns spark ebook spark-streaming spark-sql spark-structured-streaming patterns-design

Updated Jun 12, 2024
Shell

asdf2014 / yuzhouwan

Code Library for My Blog

python java go elasticsearch clojure algorithm scala ai spark hadoop tensorflow bigdata hbase zookeeper druid yuzhouwan

Updated Jun 12, 2024
Java

uni-openai / uniai-maas

An opensource AI & model as a service platform.

ai spark gpt moonshot midjourney chatgpt stability-ai chatglm uniai kimichat

Updated Jun 12, 2024
TypeScript

flyteorg / flytekit

Extensible Python SDK for developing Flyte tasks and workflows. Simple to get started and learn and highly extensible.

python data-science data automation sdk spark pypi extensible workflows hacktoberfest flyte mlops flyte-tasks

Updated Jun 12, 2024
Python

HsiehShuJeng / cdk-emrserverless-with-delta-lake

This construct builds some elements for you to quickly launch an EMR Serverless application. After submitting the Emr Serverless job, you could also launch an EMR notebook via cluster template to check the outcome from the EMR Serverless application.

python java golang aws spark serverless dotnet javacript aws-cloudformation emr-notebooks delta-lake aws-service-catalog cdk-constructs projen emr-studio emr-serverless

Updated Jun 12, 2024
TypeScript

Created by Matei Zaharia

Released May 26, 2014

Followers: 417 followers
Repository: apache/spark
Website: spark.apache.org
Wikipedia: Wikipedia

Related Topics