Skip to content

Releases: bacalhau-project/bacalhau

v1.1.0

25 Sep 11:14
970e1a0
Compare
Choose a tag to compare

v1.1.0 Release Notes

📢 Introducing Bacalhau v1.1.0 - Unleash the Power!

We are thrilled to announce the release of Bacalhau v1.1.0, a significant milestone in our quest for unparalleled computing capabilities. Packed with exciting new features like Full Fleet Targeting, Configurable Compute Timeouts, persistent storage, integration with private data swarms and API TLS support, this release is sure to take your computational experience to new heights! 🚀

But that's not all! We invite you to explore the experimental features of this release, such as Long-Running Jobs, as we continue to push the boundaries of computational possibilities.

So, what are you waiting for? Upgrade to Bacalhau v1.1.0 and unlock a world of infinite possibilities in distributed computing! 🌟

curl https://get.bacalhau.org/install.sh | bash

New features

Full Fleet Targeting

Jobs can now target all nodes in a network simultaneously, allowing for more efficient and parallel operations jobs that need to query or modify an entire fleet.

Full fleet jobs are perfect for fleet management, allowing an operator to quickly understand the state of all of their nodes at once with a single command.

Full fleet jobs will only succeed if all known nodes in a network can be reached and can execute the job successfully. Jobs can still be targeted at a subset of the fleet by using labels or resource requirements.

Pass the --target=all parameter to any Bacalhau job command or set Deal.TargetAll: true in an existing Bacalhau job spec.

New node CLI and APIs

New CLI and APIs have been introduced allowing users to easily list nodes in a network and see what compute resources that are available.

Use the new command bacalhau node list to get a tabular output of all known nodes:

Bacalhau node list output table

You can then use bacalhau node describe to get in-depth output about a specific node.

Configurable Timeouts

Jobs can now last for days or weeks, enabling the execution of big computations that require longer processing times.

By default, compute nodes now do not enforce an execution timeout and jobs default to the longest allowed timeout. Job submitters can still request a timeout using the --timeout flag or the Timeout field in their job spec.

Node operators can still choose to limit the maximum timeout allowed by passing the --max-timeout flag to the serve command or by specifying the new Node.Compute.Capacity.JobTimeouts.MaxJobExecutionTimeout property in their config file.

Richer Node Configuration

We're excited to unveil enhanced configuration options in Bacalhau v1.1.0! With a heightened focus on flexibility, we've expanded the ways you can configure Bacalhau, whether it be via a configuration file, command-line flags, or environment variables.

The new release introduces a persistent configuration file that provides more flexibility and control over node configurations. Read the documentation for how to get started with configuration files.

Key Changes from v1.0.3 to v1.1.0:

  • The enriched config.yaml now has a trove of default configuration values, an improvement from the empty version in v1.0.3.
  • Event and Libp2p tracing is no longer activated by default. Enable this by specifying paths for via EventTracerPath and Libp2PTracerPath in config.yaml.
  • The node’s private key is no longer called**private_key.1235** and is named **libp2p_private_key**by default. Configure its path with Libp2PKeyPath in config.yaml.
  • user_id.pem remains consistent. Direct its location using KeyPath in config.yaml.
  • Directory name has changed from execution-state-<NODE_ID> to <NODE_ID>-compute, and now, apart from jobStats.json, it also includes executions.db using BoltDB when using persistent storage mode. Define its path using ExecutionStore.Path in config.yaml.
  • New directories include <NODE_ID>-requester (stores the state for the requester node using BoltDB), executor_storages (hosts data for Bacalhau storage types), and plugins (houses executor plugin binaries). Configure their paths respectively via JobStore.Path, ComputeStoragePath, and ExecutorPluginPath in config.yaml.

⚠️Note: there are optional migration steps for existing Bacalhau users who want to keep their previous configuration. See the end of this note for how to migrate.

Support for TLS on public APIs

TLS certificates for serving client-facing APIs are now supported, ensuring secure and encrypted communication between Bacalhau clients and requester nodes.

To use a TLS certificate to encrypt communication, you can:

  • Configure automatic certificates from Let’s Encrypt by passing --autocert=<your-hostname> and ensuring the Bacalhau binary can respond to challenges by running sudo setcap CAP_NET_BIND_SERVICE+ep $(which bacalhau).
  • Pass a certificate to --tlscert and the corresponding private key to --tlskey.

By default, if none of the above options are used, the server will continue to serve its API endpoints over HTTP.

Persistent Storage of Jobs and Executions

Compute and requester nodes now support persistent storage, ensuring data integrity and allowing for long-term job and execution audit records. This feature is now switched on by default and records are persisted to the Bacalhau repository.

See the documentation for how to configure persistence.

Improved Error Messages

Clearer error messages are now displayed when no node is available to run a job, making troubleshooting easier and more efficient.

Instead of receiving ‘not enough nodes to run the job’, users will now get more specific help messages, such as ‘Docker image does not exist or repo is inaccessible’ or ‘job timeout exceeds the maximum allowed’.

Fine-Grained Control Over Image Entrypoint and Parameters

Users now have finer control over the entrypoint and parameters passed to a Docker image. Previously, Bacalhau would ignore the default entrypoint to the image and replace it with the first argument after bacalhau docker run <image>. Now, the default entrypoint in the image is used and all of the positional arguments are passed as the command to that entrypoint.

The entrypoint can still be explicitly overriden by using the --entrypoint flag or by setting the Entrypoint field in a Docker job spec.

GPU Support Inside Docker Containers

Bacalhau now has the capability to automatically utilize GPUs when the Bacalhau node is running inside a Docker container. Ensure that the Bacalhau node is started with a GPU capability by passing --gpus=all to docker run, and Bacalhau nodes will automatically detect GPUs running on the host machine.

Submit a job to a node running inside Docker using bacalhau docker run --gpu=1 to run the job in a new GPU-enabled container on the host.

Support for Private IPFS Clusters

Integration with private IPFS clusters has been added, providing enhanced security and control over data storage and retrieval.

To connect to a private swarm, pass the path to a swarm key to --ipfs-swarm-key, set the BACALHAU_IPFS_SWARM_KEY environment variable or configure the Node.IPFS.SwarmKeyPath configuration property.

When connecting to a private swarm, Bacalhau will no longer bootstrap using or connect to public peers and will rely on the swarm for all data retrieval.

These steps are also necessary on clients who use bacalhau get to download from a private IPFS swarm.

Note that these steps are not necessary if using the --ipfs-connect flag, which already can connect to IPFS nodes running a private swarm.

New Experimental Features

All of these features are experimental, meaning that their APIs are liable to change in an upcoming release. You are encouraged to try out these features and provide feedback or bug reports on Bacalhau Slack.

Long-Running Jobs

Bacalhau jobs can now run indefinitely and will automatically restart when nodes come back online, allowing for continuous and uninterrupted processing.

Long-running jobs allow compute workloads to process data that arrives continuously, and is perfect for tasks such as pre-filtering logs, processing real-time analytics, or working with edge sensors.

With the introduction of long-running jobs, ML inference tasks can now operate in a "warm-boot" environment. This means that the necessary resources and dependencies are already loaded, significantly reducing the time taken to run an inference job.

With this experimental feature, you can now unleash the power of Bacalhau to handle dynamic and ever-changing data streams, ensuring continuous and uninterrupted processing of your computational workloads.

Deprecated Features

Estuary

The Estuary publisher is no longer supported in this release. Compute nodes will now reject jobs that require the Estuary publisher.

Verification

The Verifiers feature is no longer supported in this release. Compute nodes will silently ignore verification requirements on jobs.

⚠️ Migration steps

Users who wish to continue using their previous Bacalhau private key or their previous Bacalhau Client ID as their identity will need to either:

  • Rename private_key.1235 to libp2p_private_key
  • Modify the config.yaml to use the previous key by editing the value of Libp2PKeyPath to point to its path.

Up Next

These upcoming features aim to provide users with increased flexibility and convenience in their computational workflows while maintaining a ...

Read more

v1.0.3

01 Jun 15:56
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.0.0...v1.0.3

v1.0.0

09 May 13:49
c62c8ab
Compare
Choose a tag to compare

Bacalhau 1.0 Release: Featuring Private Clusters, Octostore, and Federated Learning

Today marks the launch of Bacalhau 1.0, the general availability (GA) release of the open source distributed compute platform. The project’s mission is to revolutionize the way organizations and developers harness the power of collaborative computing, and the GA release marks an important milestone towards that goal. Since launching our beta release in November, the project has seen more than 3,000 commits from more than 30 contributors and a release every two weeks. Additionally, customers like the New Atlantis Foundation, the City of Las Vegas, and the University of Maryland are executing hundreds of thousands of jobs every month on the public network. To read more about Bacalhau, and try it out for yourself, go to https://bacalhau.org/.

Background

​​Distributed computing has long been recognized as a powerful approach for tackling large-scale, complex problems by harnessing the collective power of devices everywhere. However, developers face significant challenges in adopting it, including inefficient resource allocation, communication bottlenecks, and high barriers to entry for non-expert users.

But the time to address the issues is now. By 2025, IDC believes that we will have generated more than 175 zettabytes of data, 50 times more data than we do today. Yet critical insights to make better decisions are hidden behind distributed devices and storage.

(Re-)Introducing the Bacalhau Project

Bacalhau was created to address these challenges head-on through a platform designed from the ground up for the distributed world. Built by core members of the Kubernetes, Kubeflow, Amazon Kinesis communities and employees from Google, AWS, and Microsoft, Bacalhau provides a new way to build and use globally deployed applications and data that is familiar, high scale, and efficient. Further, because Bacalhau is open source and Apache2/MIT licensed, the community is built to foster collaboration and innovation, allowing developers from around the world to contribute their expertise and continually improve upon the platform.

General Availability Release of Bacalhau

The GA release of Bacalhau includes the following features:

Long Term Mission

Our long term goal is to transform the way that developers can interact with the breadth of computing and data resources out there. Some of the features we have on the horizon include:
A fully distributed computation platform that can run on any device, anywhere
A declarative pipeline that can both run the data processing and also record the lineage of the data
A highly resilient system that can schedule across latency boundaries and deliver the reliability a global deployment needs, even over spotty network connectivity
Secure and verifiable results that can be used to confirm the integrity and reproducibility of the results forever

But you tell us! We'd love to hear about new directions we may need to include.

How to Get Involved

We're looking for help in several areas. If you're interested in helping out, please reach out to us at any of the following locations:

v0.3.29

03 May 01:55
0393e4f
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.3.28...v0.3.29

v0.3.28

14 Apr 23:54
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.3.25...v0.3.28

v0.3.27

13 Apr 16:13
1ec52c4
Compare
Choose a tag to compare
v0.3.27 Pre-release
Pre-release

What's Changed

Full Changelog: v0.3.26...v0.3.27

v0.3.26

13 Apr 13:05
9e8adff
Compare
Choose a tag to compare
v0.3.26 Pre-release
Pre-release

What's Changed

Full Changelog: v0.3.25...v0.3.26

v0.3.25

21 Mar 22:18
12e6fea
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.3.24...v0.3.25

v0.3.23

28 Feb 17:23
a8e7cb2
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.3.22...v0.3.23

v0.3.22

22 Feb 15:22
2e444a7
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.3.21...v0.3.22