-
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
3 changed files
with
75 additions
and
0 deletions.
There are no files selected for viewing
74 changes: 74 additions & 0 deletions
74
packages/web/client/blog/2024/improved-architecture /README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
--- | ||
title: 🏗️ Improved Architecture | ||
publishDate: "2024-05-11T21:02:43.903Z" | ||
layout: "article" | ||
authorName: "Bret Comnes" | ||
authorUrl: "https://bret.io" | ||
authorImgUrl: "/static/bret-ava.png" | ||
--- | ||
|
||
Breadcrum has finally shipped a new and improved backend architecture! | ||
These changes bring immediate performance and reliability improvements and also unlock the foundation for the next round of really exciting features. | ||
|
||
## The original setup | ||
|
||
One of the primary design goals of Breadcrum was to build using unremarkable, reliable, proven technology and patterns while still utilizing as much of the time saving features modern technologies offer. | ||
|
||
Breadcrum launched with roughly the following architecture: | ||
|
||
DIAGRAM GOES HERE.. | ||
|
||
- A Fastify "API First" JSON API server. | ||
- Provides 100% of the APIs needed to implement a client against. | ||
- Fully validated input and output schemas, doubling as a public OpenAPI 3.0 Schema. | ||
- Static file hosting for the frontend client app. | ||
- A Postgres relational database. | ||
- A Redis cache for rate limiting (but only a single consumer initially). | ||
- A dedicate private yt-dlp-api | ||
- Provides a long running python process since loading yt-dlp per request was way to slow. | ||
- A top-bun, multipage JAMStack web client, with static prerendering. | ||
- Hosted on fly.io using Alpine Linux Docker containers. | ||
- Continuous integration and deployment run on GitHub Actions. | ||
|
||
This implementation served its purpose well. | ||
It was simple to set up. It was reliable and despite omitting a number of common conventions (SSR, CDN, Typescript) it never felt like it prioritized short term productivity at the cost of long term flexibility. | ||
Building and maturing on top of this base has been pure pleasure. | ||
|
||
There were a few points of pain. If the server ever crashed, all in-flight jobs and work were lost, and would not recover after a reboot. Server resolved bookmark details, website archive extraction and episode resolution would be dropped and none of the resources would finish resolving. Users would have to manually retry their creation if it was noticed. | ||
|
||
Resource scaling was also super uneven. | ||
Requests that set off work heavy jobs, like processing a large website, could slow down requests or crash the server if tripped the out of memory governor. | ||
The server resource needed to cover 99% of requests safely and quickly was much smaller that the resources needed to cover the last 1% of heavy workload requests. | ||
In order to not run into the above problems, provisioning for the 1% was necessary. | ||
|
||
## The new setup | ||
|
||
Here is a diagram of the new architecture: | ||
|
||
DIAGRAM GOES HERE: ... | ||
|
||
The architecture is mostly the same except for a few additions and changes: | ||
|
||
- A second Fastify wrapped BullMQ worker service. | ||
- Two Redis instances: a volatile shared cache and a NOEVICT instance for the queue. | ||
- High availability, single region Postgres server. | ||
|
||
### The worker | ||
|
||
The worker process is a second Fastify service that spins up a few BullMQ queue workers. | ||
These connect to a new second Redis instance that is dedicated to tracking queued jobs and allowing workers to coordinate which process takes which job. | ||
|
||
Async task queues are nothing new, but they offer many wonderful features that make processing tasks and jobs a lot easier to reason about: | ||
|
||
- Durable. If the worker dies during a job, the job queue notices this and can retry the job in the next available worker according to simple rules. | ||
- Scalable: Spinning up new workers is easy, and increases your throughput and parallelism. If you are falling behind, just scale up by adding new workers. | ||
- Schedules: You can also schedule work based on dates and time offsets. Implementing things like "retry when this video has gone live" now becomes a trivial and reliable scheduling task in the work queue. | ||
- Asynchronous: API endpoints can support long running tasks without holding http connections open for long periods of time, and the Job IDs used to track work are a perfect match for asynchronous API endpoint patterns. | ||
- Progress: job workers can report progress on the individual job level. | ||
- Observable: The queue system ships with built in observably tools that let you watch the progress of your work queues as it runs. Additionally, it includes book keeping about completed and failed jobs for later investigation. | ||
- Highly elastic: workers can scale to 0 when there is no work to be done, and wake up when there is work to be done. Large jobs can wake up more expensive resources on demand and shut them off when not needed. | ||
|
||
You can take on almost any workload type exposed through an API with distributed queues, and now that Breadcrum has it, we can start taking on some of the more challenging features like inserting scheduled videos into your feeds when they become available, instead of making the user create an episode when it finally comes available. | ||
Downloading large chunked hls video playlists and re-muxing them into podcast friendly formats and uploading them to affordable cloud hosting also becomes possible with a work queue. | ||
|
||
And more importantly, the API server is doing less of the heavy lifting and is less in danger of running out of memory on large jobs. |
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters