big-data

Hazelcast is a unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on data-in-motion for real-time insights.

java caching distributed-systems real-time big-data hazelcast scalability distributed-computing distributed stream-processing in-memory low-latency hacktoberfest data-insights data-in-motion

Updated May 20, 2024
Java

IvanildoBarauna / IvanildoBarauna

Star

Personal Repository

python bigquery airflow big-data ai gcp data-engineering

Updated May 20, 2024

apache / datafusion

Star

Apache DataFusion SQL Query Engine

python rust sql big-data arrow olap query-engine dataframe datafusion

Updated May 20, 2024
Rust

trinodb / trino

Star

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

java distributed-systems data-science sql database big-data presto hive hadoop analytics jdbc databases distributed-database query-engine iceberg datalake prestodb trino delta-lake

Updated May 20, 2024
Java

paradedb / paradedb

Sponsor

Star

Postgres for Search and Analytics

Updated May 20, 2024
Rust

vespa-engine / vespa

Star

AI + Data, online. https://vespa.ai

java search-engine machine-learning big-data ai server cpp tensorflow vespa serving serving-recommendation vector-search

Updated May 20, 2024
Java

ytsaurus / ytsaurus

Star

YTsaurus is a scalable and fault-tolerant open-source big data platform.

sql big-data spark clickhouse distributed-database lakehouse olap-database ytsaurus

Updated May 20, 2024
C++

ClickHouse / ClickHouse

Star

ClickHouse® is a free analytics DBMS for big data

sql big-data analytics clickhouse dbms olap distributed-database mpp hacktoberfest

Updated May 20, 2024
C++

apache / spark

Star

Apache Spark - A unified analytics engine for large-scale data processing

python java r scala sql big-data spark jdbc

Updated May 20, 2024
Scala

microsoft / SynapseML

Star

Simple and Distributed Machine Learning

Updated May 20, 2024
Scala

gchq / stroom

Star

Stroom is a highly scalable data storage, processing and analysis platform.

enrichment big-data xml xslt visualisation data-analytics dashboards lucene pipeline-processor

Updated May 20, 2024
Java

pachyderm / pachyderm

Star

Data-Centric Pipelines and Data Versioning

go docker kubernetes distributed-systems data-science big-data analytics containers data-analysis pachyderm

Updated May 20, 2024
Go

CDP Public Cloud is an integrated analytics and data management platform deployed on cloud services. It offers broad data analytics and artificial intelligence functionality along with secure user access and data governance features.

java cloud big-data hadoop deployment cloudera hacktoberfest

Updated May 20, 2024
Java

delta-io / delta

Star

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

big-data spark analytics acid delta-lake

Updated May 20, 2024
Scala

Improve this page

Add a description, image, and links to the big-data topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the big-data topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

big-data

Here are 3,995 public repositories matching this topic...

drisskhattabi6 / Real-Time-Twitter-Sentiment-Analysis

intel / scikit-learn-intelex

smooks / smooks

prestodb / presto

geoHeil / awesome-tools

arkime / arkime

hazelcast / hazelcast

IvanildoBarauna / IvanildoBarauna

apache / datafusion

trinodb / trino

paradedb / paradedb

vespa-engine / vespa

ytsaurus / ytsaurus

ClickHouse / ClickHouse

apache / spark

microsoft / SynapseML

gchq / stroom

pachyderm / pachyderm

hortonworks / cloudbreak

delta-io / delta

Improve this page

Add this topic to your repo