Skip to content

Commit

Permalink
WIP blogpost
Browse files Browse the repository at this point in the history
  • Loading branch information
bcomnes committed May 13, 2024
1 parent 24f3ea3 commit a906a14
Show file tree
Hide file tree
Showing 3 changed files with 75 additions and 0 deletions.
74 changes: 74 additions & 0 deletions packages/web/client/blog/2024/improved-architecture /README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
---
title: 🏗️ Improved Architecture
publishDate: "2024-05-11T21:02:43.903Z"
layout: "article"
authorName: "Bret Comnes"
authorUrl: "https://bret.io"
authorImgUrl: "/static/bret-ava.png"
---

Breadcrum has finally shipped a new and improved backend architecture!
These changes bring immediate performance and reliability improvements and also unlock the foundation for the next round of really exciting features.

## The original setup

One of the primary design goals of Breadcrum was to build using unremarkable, reliable, proven technology and patterns while still utilizing as much of the time saving features modern technologies offer.

Breadcrum launched with roughly the following architecture:

DIAGRAM GOES HERE..

- A Fastify "API First" JSON API server.
- Provides 100% of the APIs needed to implement a client against.
- Fully validated input and output schemas, doubling as a public OpenAPI 3.0 Schema.
- Static file hosting for the frontend client app.
- A Postgres relational database.
- A Redis cache for rate limiting (but only a single consumer initially).
- A dedicate private yt-dlp-api
- Provides a long running python process since loading yt-dlp per request was way to slow.
- A top-bun, multipage JAMStack web client, with static prerendering.
- Hosted on fly.io using Alpine Linux Docker containers.
- Continuous integration and deployment run on GitHub Actions.

This implementation served its purpose well.
It was simple to set up. It was reliable and despite omitting a number of common conventions (SSR, CDN, Typescript) it never felt like it prioritized short term productivity at the cost of long term flexibility.
Building and maturing on top of this base has been pure pleasure.

There were a few points of pain. If the server ever crashed, all in-flight jobs and work were lost, and would not recover after a reboot. Server resolved bookmark details, website archive extraction and episode resolution would be dropped and none of the resources would finish resolving. Users would have to manually retry their creation if it was noticed.

Resource scaling was also super uneven.
Requests that set off work heavy jobs, like processing a large website, could slow down requests or crash the server if tripped the out of memory governor.
The server resource needed to cover 99% of requests safely and quickly was much smaller that the resources needed to cover the last 1% of heavy workload requests.
In order to not run into the above problems, provisioning for the 1% was necessary.

## The new setup

Here is a diagram of the new architecture:

DIAGRAM GOES HERE: ...

The architecture is mostly the same except for a few additions and changes:

- A second Fastify wrapped BullMQ worker service.
- Two Redis instances: a volatile shared cache and a NOEVICT instance for the queue.
- High availability, single region Postgres server.

### The worker

The worker process is a second Fastify service that spins up a few BullMQ queue workers.
These connect to a new second Redis instance that is dedicated to tracking queued jobs and allowing workers to coordinate which process takes which job.

Async task queues are nothing new, but they offer many wonderful features that make processing tasks and jobs a lot easier to reason about:

- Durable. If the worker dies during a job, the job queue notices this and can retry the job in the next available worker according to simple rules.
- Scalable: Spinning up new workers is easy, and increases your throughput and parallelism. If you are falling behind, just scale up by adding new workers.
- Schedules: You can also schedule work based on dates and time offsets. Implementing things like "retry when this video has gone live" now becomes a trivial and reliable scheduling task in the work queue.
- Asynchronous: API endpoints can support long running tasks without holding http connections open for long periods of time, and the Job IDs used to track work are a perfect match for asynchronous API endpoint patterns.
- Progress: job workers can report progress on the individual job level.
- Observable: The queue system ships with built in observably tools that let you watch the progress of your work queues as it runs. Additionally, it includes book keeping about completed and failed jobs for later investigation.
- Highly elastic: workers can scale to 0 when there is no work to be done, and wake up when there is work to be done. Large jobs can wake up more expensive resources on demand and shut them off when not needed.

You can take on almost any workload type exposed through an API with distributed queues, and now that Breadcrum has it, we can start taking on some of the more challenging features like inserting scheduled videos into your feeds when they become available, instead of making the user create an episode when it finally comes available.
Downloading large chunked hls video playlists and re-muxing them into podcast friendly formats and uploading them to affordable cloud hosting also becomes possible with a work queue.

And more importantly, the API server is doing less of the heavy lifting and is less in danger of running out of memory on large jobs.
Binary file modified packages/web/client/static/bret-ava.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions packages/web/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@
"watch": "run-s clean && run-p watch:*",
"watch:server": "fastify start -w --ignore-watch='node_modules .git ./client ./public' -l info -P -p 3000 --options --address localhost app.js",
"watch:tob-bun": "npm run build:top-bun -- --watch-only",
"top-bun-watch": "npm run build:top-bun -- --watch",
"print-routes": "fastify print-routes app.js",
"print-plugins": "fastify print-plugins app.js",
"generate-dbml": "npx pg-to-dbml --c postgres://postgres@localhost/breadcrum"
Expand Down

0 comments on commit a906a14

Please sign in to comment.