bank: rewrite status cache #1790

mmcgee-jump · 2024-05-08T19:47:13Z

The status cache has two main issues,

(1) It's not particularly concurrent, and causes slowdowns when running
with highly parallel bank execution.

(2) It has unbounded memory consumption, and can cause out of memory
conditions when it needs to store large numbers of transactions.

It is rewritten in Firedancer for performance. The general design is there are two operations which need to be highly concurrent, insert and query, and everything else is rare and happens between slots and can largely just lock the whole thing.

The base case for insert is optimized to two always-uncontended, and one very lightly contended compare and swap. Query is fully lockless.

The transaction result storage is combined between the snapshot service cache, and the query lookup cache, which more than halves the memory usage and the memory use is fixed up front.

Related to #1770

The status cache has two main issues, (1) It's not particularly concurrent, and causes slowdowns when running with highly parallel bank execution. (2) It has unbounded memory consumption, and can cause out of memory conditions when it needs to store large numbers of transactions. It is rewritten in Firedancer for performance. The general design is there are two operations which need to be highly concurrent, insert and query, and everything else is rare and happens between slots and can largely just lock the whole thing. The base case for insert is optimized to two always-uncontended, and one very lightly contended compare and swap. Query is fully lockless. The transaction result storage is combined between the snapshot service cache, and the query lookup cache, which more than halves the memory usage and the memory use is fixed up front.

ptaffet-jump · 2024-05-10T21:11:27Z

src/disco/bank/fd_txncache.c

+  if( FD_UNLIKELY( max_rooted_slots<1UL || max_live_slots<1UL ) ) return 0UL;
+  if( FD_UNLIKELY( max_live_slots<max_rooted_slots ) ) return 0UL;
+  if( FD_UNLIKELY( max_txn_per_slot<1UL ) ) return 0UL;
+  if( FD_UNLIKELY( !fd_ulong_is_pow2( max_live_slots || !fd_ulong_is_pow2( max_txn_per_slot ) ) ) ) return 0UL;


Suggested change

if( FD_UNLIKELY( !fd_ulong_is_pow2( max_live_slots || !fd_ulong_is_pow2( max_txn_per_slot ) ) ) ) return 0UL;

if( FD_UNLIKELY( !fd_ulong_is_pow2( max_live_slots ) || !fd_ulong_is_pow2( max_txn_per_slot ) ) ) return 0UL;

ptaffet-jump · 2024-05-10T21:13:09Z

src/disco/bank/fd_txncache.c

+
+    ulong txnhash_offset = blockcache->txnhash_offset;
+    ulong head_hash = FD_LOAD( ulong, query->txnhash+txnhash_offset ) % FD_TXNCACHE_BLOCKCACHE_MAP_CNT;
+    for( uint head=blockcache->heads[ head_hash ]; head!=UINT_MAX; head=tc->txnpages[ head/FD_TXNCACHE_TXNS_PER_PAGE ].txns[ head%FD_TXNCACHE_TXNS_PER_PAGE ]->blockcache_next ) {


You could consider a macro for tc->txnpages[ head/FD_TXNCACHE_TXNS_PER_PAGE ].txns[ head%FD_TXNCACHE_TXNS_PER_PAGE ] since it shows up several times in places it would be nice if it weren't so long

ptaffet-jump · 2024-05-10T21:19:41Z

src/disco/bank/fd_txncache.c

+
+typedef struct fd_txncache_private_txn fd_txncache_private_txn_t;
+
+struct fd_txncache_private_txnpage {


You should mention explicitly here that all the txns refer to the same blockhash

ptaffet-jump · 2024-05-10T21:20:54Z

src/disco/bank/fd_txncache.c

+typedef struct fd_txncache_private_txn fd_txncache_private_txn_t;
+
+struct fd_txncache_private_txnpage {
+  ushort                    free; /* The number of free txn entries in this page. */


Alignment of txns is 4, so you have two bytes of padding here FYI

I haven't finished understanding it, so I'm not sure this is possible, but what about making a linked list of the pages for a blockhash using these bytes. Then you could get rid of the pages pointer and freeing a list of pages would be O(1).

ptaffet-jump · 2024-05-10T21:42:00Z

src/disco/bank/fd_txncache.c

+
+  ulong idx;
+  for( idx=0UL; idx<tc->root_slots_cnt; idx++ ) {
+    if( FD_UNLIKELY( tc->root_slots[ idx ]==slot ) ) goto unlock;


Include that re-registering a root slot is a no-op in the function comment

ptaffet-jump · 2024-05-10T21:46:08Z

src/disco/bank/fd_txncache.c

+  ushort txnpages_per_blockhash_max;
+
+  ulong   root_slots_cnt; /* The number of root slots being tracked in the below array. */
+  ulong * root_slots; /* The highest N slots that have been rooted.  These slots are


Could you add a comment about the way this is used? I was definitely expecting something more like a ring buffer with far fewer memmoves, and that initially confused me.

mmcgee-jump added this to the Frankendancer milestone May 8, 2024

mmcgee-jump requested a review from ptaffet-jump May 8, 2024 19:47

ptaffet-jump reviewed May 10, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bank: rewrite status cache #1790

bank: rewrite status cache #1790

mmcgee-jump commented May 8, 2024

ptaffet-jump May 10, 2024

ptaffet-jump May 10, 2024

ptaffet-jump May 10, 2024

ptaffet-jump May 10, 2024

ptaffet-jump May 10, 2024

ptaffet-jump May 10, 2024 •

edited

ptaffet-jump May 10, 2024

	if( FD_UNLIKELY( !fd_ulong_is_pow2( max_live_slots \|\| !fd_ulong_is_pow2( max_txn_per_slot ) ) ) ) return 0UL;
	if( FD_UNLIKELY( !fd_ulong_is_pow2( max_live_slots ) \|\| !fd_ulong_is_pow2( max_txn_per_slot ) ) ) return 0UL;


		typedef struct fd_txncache_private_txn fd_txncache_private_txn_t;

		struct fd_txncache_private_txnpage {

bank: rewrite status cache #1790

Are you sure you want to change the base?

bank: rewrite status cache #1790

Conversation

mmcgee-jump commented May 8, 2024

ptaffet-jump May 10, 2024

Choose a reason for hiding this comment

ptaffet-jump May 10, 2024

Choose a reason for hiding this comment

ptaffet-jump May 10, 2024

Choose a reason for hiding this comment

ptaffet-jump May 10, 2024

Choose a reason for hiding this comment

ptaffet-jump May 10, 2024

Choose a reason for hiding this comment

ptaffet-jump May 10, 2024 • edited

Choose a reason for hiding this comment

ptaffet-jump May 10, 2024

Choose a reason for hiding this comment

ptaffet-jump May 10, 2024 •

edited