feat(ds): add cache #12453

thalesmg · 2024-02-01T21:04:51Z

Fixes https://emqx.atlassian.net/browse/EMQX-10943

Release version: v/e5.6

Summary

PR Checklist

Please convert it to a draft if any of the following conditions are not met. Reviewers may skip over until all the items are checked:

Added tests for the changes
Added property-based tests for code which performs user input validation
Changed lines covered in coverage report
Change log has been added to changes/(ce|ee)/(feat|perf|fix|breaking)-<PR-id>.en.md files
For internal contributor: there is a jira ticket to track this change
Created PR to emqx-docs if documentation update is required, or link to a follow-up jira ticket
Schema changes are backward compatible

Checklist for CI (.github/workflows) changes

If changed package build workflow, pass this action (manual trigger)
Change log has been added to changes/ dir for user-facing artifacts update

ieQu1 · 2024-02-02T09:49:56Z

apps/emqx/src/emqx_persistent_session_ds.erl

@@ -763,7 +763,7 @@ enqueue_batch(IsReplay, BatchSize, Srs0, Session = #{inflight := Inflight0}, Cli
            true -> ItBegin0;
            false -> ItEnd0
        end,
-    case emqx_ds:next(?PERSISTENT_MESSAGE_DB, ItBegin, BatchSize) of
+    case emqx_ds:next(?PERSISTENT_MESSAGE_DB, ItBegin, BatchSize, #{use_cache => true}) of


Why do we need a new option for the cache? Does it change the semantics of next?

Semantics, no. But we need a way to call next without going through the cache. One example is the cache worker: when it tries to update the cache, it should go through it.

This could be solved by having an internal API in the replication layer that always goes directly to the RPC/RocksDB, and exposing the cached version to emqx_ds callback module (similar to how egress workers intercept calls).

Exposing this option directly in the API only makes sense when we want to give the API consumers an option to bypass cache, but I can't come up with a situation when it would be needed.

If the cache is to be used by other backends, then shouldn't it be controllable from emqx_ds, like egress is as well, instead of directly going to replication_layer?

Egress also has its options exposed:

emqx/apps/emqx_durable_storage/src/emqx_ds.erl

Lines 251 to 257 in ed8660c

-spec store_batch(db(), [emqx_types:message()], message_store_opts()) -> store_batch_result().

store_batch(DB, Msgs, Opts) ->

?module(DB):store_batch(DB, Msgs, Opts).

-spec store_batch(db(), [emqx_types:message()]) -> store_batch_result().

store_batch(DB, Msgs) ->

store_batch(DB, Msgs, #{}).

Inverted the default behavior to use the cache by default in the replication layer.

Options exposed by egress actually change the semantics of the call: for example, async => true means the message can be lost before being persisted.

Ok. Still, if I understand correctly, the cache could be used by other backends. If so, the way to call next without attempting to use the case should live in emqx_ds, right?

apps/emqx_durable_storage/src/emqx_ds_builtin_db_sup.erl

apps/emqx_durable_storage/src/emqx_ds_cache_coordinator.erl

ieQu1 · 2024-02-02T18:46:39Z

apps/emqx_durable_storage/src/emqx_ds_cache_worker.erl

+
+-spec start_link(emqx_ds:db(), emqx_ds:stream(), emqx_ds:topic_filter(), emqx_ds:time()) ->
+    supervisor:startchild_ret().
+start_link(DB, Stream, TopicFilter, StartTime) ->


I think in the long term the cache workers shouldn't care about topic filters, and simply cache the entire contents of the stream. This will allow to share the contents of the cache between different clients that potentially subscribe to different topics, that happen to be mapped to the same stream.

The consumer of the messages can use the cached contents as a substitute of RocksDB table (given the key order is the same). It then can do the post-processing of the cached messages using the topic filter and start time stored in the iterator.

The current API to create iterators require the topic filter to be provided, so I guess it'll require new APIs to iterate over the stream?

Or do you suggest to track # for the stream?

ieQu1 · 2024-02-02T18:49:13Z

apps/emqx_durable_storage/src/emqx_ds_cache_worker.erl

+    gen_server:start_link(
+        ?via(#?cache_worker{db = DB, stream = Stream}),
+        ?MODULE,
+        {DB, Stream, TopicFilter, StartTime},


Why do we need to supply topic filter and start time to the cache worker? It seems inconsistent with the process ID, that only allows to have one cache worker for the DB and the stream.
If different different topic filters happen to map to the same stream, does it mean they will compete for the process registration?

The current API to create iterators require the topic filter to be provided, so I don't think there's currently a way around it?

If different different topic filters happen to map to the same stream, does it mean they will compete for the process registration?

Indeed, if one misconfigures the topic filters as it is here, it would happen.

Also, we've discussed that the first version of the cache could use statically configured topic filters, and later make it automatically track streams based on usage or other heuristics. So the future version might not need to specify the topic filter and use #.

I've just found a fundamental problem with the cache right now: it is currently just relying on seqnos to detect gaps, without checking that the message topic actually matches the iterator topic filter... It'll require some rethinking. 🙈

ieQu1 · 2024-02-07T13:10:47Z

apps/emqx_durable_storage/src/emqx_ds_cache.hrl

+}).
+-type cache_entry() :: #cache_entry{}.
+
+-type seqno() :: non_neg_integer().


Nit: It's better to avoid defining types in the headers, since they'll end up duplicated in each module using the hrl.

ieQu1 · 2024-02-07T13:16:42Z

apps/emqx_durable_storage/src/emqx_ds_cache.hrl

+-ifndef(EMQX_DS_CACHE_HRL).
+-define(EMQX_DS_CACHE_HRL, true).
+
+-define(CACHE_KEY(LASTKEY), {LASTKEY}).


I see why the key is wrapped in a tuple (erlang term order), but it will lead to extra memory allocations and de-allocations in the happy path. For performance reasons, we should try to store as little data as possibly. Perhaps a better solution would be to add "end_of_stream" flag to some external data structure associated with the stream cache, and check it when the next iterator returns empty list, smth. like:

case next_cache(StreamCache...) of Ret = {ok, Messages = [], ItNext} -> case ets:lookup(StreamCache, ?EOF_KEY) of [_] -> {ok, end_of_stream}; [] -> Ret end; ...

ieQu1 · 2024-02-07T13:27:15Z

apps/emqx_durable_storage/src/emqx_ds_cache.hrl

+-record(cache_entry, {
+    key :: ?CACHE_KEY(emqx_ds:message_key()) | ?EOS_KEY,
+    seqno :: seqno(),
+    inserted_at :: timestamp(),


Can we use the timestamp from the key? We don't have to be very precise in the cache eviction flow.

From the key inside the message, you mean? If we assume only more or less recent keys are added to cache (i.e., we don't start caching from the distant past), I guess so.

If it's the #cache_entry.key you mean, I think there's no way to extract it from the emqx_ds:message_key() in the general case without yet another callback. 😅

thalesmg · 2024-02-20T22:01:31Z

Using non-wildcard subscriptions (`t/%n`)

Left: without cache
Right: with cache

Using wildcard subscriptions (`t/#`)

Left: without cache
Right: with cache

thalesmg · 2024-02-27T19:32:49Z

I ran the test with 1 k subscriber and 1 k publishers (non-wildcard) . Network usage drops a bit, but I guess it's because the received message rate also drops substantially... CPU usage is similar in both scenarios (very high, ~ 100 %). With the cache, it stays at ~ 100 % even at "rest"... 🙈

Curiously, loadgen (LG) CPU usage also seemed higher with the cache. 🤔

Left: without cache.
Right: with cache.

3rd image is LG.

This is so that we may extract the key directly from an iterator without the need for constant RPCs.

Fixes https://emqx.atlassian.net/browse/EMQX-10943

…opic filter

thalesmg mentioned this pull request Feb 1, 2024

feat(ds): add cache #12171

Closed

9 tasks

thalesmg force-pushed the ds-cache-p2-mk4-m-20240131 branch from 5a15d41 to 26fd62b Compare February 1, 2024 21:37

ieQu1 reviewed Feb 2, 2024

View reviewed changes

apps/emqx_durable_storage/src/emqx_ds_builtin_db_sup.erl Outdated Show resolved Hide resolved

ieQu1 reviewed Feb 2, 2024

View reviewed changes

apps/emqx_durable_storage/src/emqx_ds_cache_coordinator.erl Show resolved Hide resolved

thalesmg force-pushed the ds-cache-p2-mk4-m-20240131 branch 3 times, most recently from 599b503 to 4c04bc3 Compare February 2, 2024 18:37

ieQu1 reviewed Feb 2, 2024

View reviewed changes

thalesmg force-pushed the ds-cache-p2-mk4-m-20240131 branch 3 times, most recently from e31d32e to e9cafd6 Compare February 5, 2024 16:50

ieQu1 reviewed Feb 7, 2024

View reviewed changes

thalesmg force-pushed the ds-cache-p2-mk4-m-20240131 branch 2 times, most recently from 0a58027 to 0b22461 Compare February 7, 2024 19:19

thalesmg force-pushed the ds-cache-p2-mk4-m-20240131 branch 2 times, most recently from 4bc5ceb to 6a58997 Compare February 27, 2024 17:00

thalesmg force-pushed the ds-cache-p2-mk4-m-20240131 branch 4 times, most recently from c86fd74 to 516c503 Compare March 5, 2024 12:12

thalesmg marked this pull request as ready for review March 5, 2024 13:59

thalesmg requested review from keynslug, a team and lafirest as code owners March 5, 2024 13:59

thalesmg added 6 commits March 7, 2024 09:16

feat(ds): add optional last_seen_key_extractor callback

2b28de4

This is so that we may extract the key directly from an iterator without the need for constant RPCs.

feat(ds): replace ?shard with ?stream in replication layer iterator

7bcd91c

feat(ds): add cache

6e271d9

Fixes https://emqx.atlassian.net/browse/EMQX-10943

fix(ds_cache): check whether fetched message topic matches iterator t…

1436b56

…opic filter

perf(ds_cache): avoid wrapping key in tuple

a1a34c6

perf(ds_cache): remove inserted_at from cache entry

1550f75

thalesmg force-pushed the ds-cache-p2-mk4-m-20240131 branch from 516c503 to 1550f75 Compare March 7, 2024 12:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ds): add cache #12453

feat(ds): add cache #12453

thalesmg commented Feb 1, 2024

ieQu1 Feb 2, 2024

thalesmg Feb 2, 2024

ieQu1 Feb 2, 2024

thalesmg Feb 2, 2024 •

edited

thalesmg Feb 2, 2024

thalesmg Feb 2, 2024

ieQu1 Feb 2, 2024

thalesmg Feb 2, 2024

ieQu1 Feb 2, 2024 •

edited

thalesmg Feb 2, 2024

thalesmg Feb 2, 2024

ieQu1 Feb 2, 2024

thalesmg Feb 2, 2024

thalesmg Feb 2, 2024

thalesmg Feb 2, 2024

thalesmg Feb 2, 2024

ieQu1 Feb 7, 2024

ieQu1 Feb 7, 2024 •

edited

ieQu1 Feb 7, 2024

thalesmg Feb 7, 2024

thalesmg commented Feb 20, 2024

thalesmg commented Feb 27, 2024

	-spec store_batch(db(), [emqx_types:message()], message_store_opts()) -> store_batch_result().
	store_batch(DB, Msgs, Opts) ->
	?module(DB):store_batch(DB, Msgs, Opts).

	-spec store_batch(db(), [emqx_types:message()]) -> store_batch_result().
	store_batch(DB, Msgs) ->
	store_batch(DB, Msgs, #{}).

feat(ds): add cache #12453

Are you sure you want to change the base?

feat(ds): add cache #12453

Conversation

thalesmg commented Feb 1, 2024

Summary

PR Checklist

Checklist for CI (.github/workflows) changes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thalesmg Feb 2, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ieQu1 Feb 2, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ieQu1 Feb 7, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thalesmg commented Feb 20, 2024

Using non-wildcard subscriptions (t/%n)

Using wildcard subscriptions (t/#)

thalesmg commented Feb 27, 2024

thalesmg Feb 2, 2024 •

edited

ieQu1 Feb 2, 2024 •

edited

ieQu1 Feb 7, 2024 •

edited

Using non-wildcard subscriptions (`t/%n`)

Using wildcard subscriptions (`t/#`)