Add a OnManualFlushScheduled callback in event listener #12631

jowlyzhang · 2024-05-09T00:19:33Z

As titled. Also added the newest user-defined timestamp into the MemTableInfo. This can be a useful info in the callback.

Added some unit tests as examples for how users can use two separate approaches to allow manual flush / manual compactions to go through when the user-defined timestamps in memtable only feature is enabled. One approach relies on selectively increase cutoff timestamp in OnMemtableSeal callback when it's initiated by a manual flush. Another approach is to increase cutoff timestamp in OnManualFlushScheduled callback. The caveats of the approaches are also documented in the unit test.

ajkr · 2024-05-10T18:16:22Z

db/db_impl/db_impl_compaction_flush.cc

@@ -2426,6 +2443,8 @@ Status DBImpl::FlushMemTable(ColumnFamilyData* cfd,
      }
    }
  }
+
+  NotifyOnManualFlushScheduled({cfd}, flush_reason);


The FlushRequest generated by FlushMemTable() specifies std::numeric_limits<uint64_t>::max() as the max memtable ID to persist. Is it possible the background flush includes memtables newer than the one in memtable_ids_to_wait? That would mean increasing the cutoff TS here would not guarantee the flush can happen.

That's a good question. Yes, if another memtable is sealed in between the IncreaseFullHistoryTsLow here and the time background flush checks for whether timestamps expired. It could mean that the cutoff TS here is not sufficient to guarantee the flush can happen.

If that sealed memtable is caused by another manual flush type of event, this call back should have also be invoked to increase the cutoff TS to a higher point. If that sealed memtable is caused by regular writes filling up a memtable, this would be an issue, like when the write rate is very high.

I think updating the memtable id in FlushRequest to be GetLatestMemTableID instead of std::numeric_limits<uint64_t>::max() can help make sure the flush can still proceed in this case. Do you have any concerns for making this change?

I think updating the memtable id in FlushRequest to be GetLatestMemTableID instead of std::numeric_limits<uint64_t>::max() can help make sure the flush can still proceed in this case. Do you have any concerns for making this change?

Sorry for the delay. It's hard to say. I think that bounding the flushed memtable ID was introduced for atomic_flush. We didn't use it everywhere because it's more efficient (for write-amp, at least) to greedily pick as many memtables as possible at flush-time. That can make a difference when the flush queue is long, which is rare so it's a minor optimization. Also, one could argue that foreground flush latency is more important than write-amp in case of manual flush. So, introducing the limit is fine with me.

Still, would it be enough? There could be a case where manual flush does not generate any flush request because there is already one queued for automatic flush. If that one fails or is postponed, do we add a new flush request with unbounded memtable ID?

Thanks for the detailed context on this optimization. At flush job creation time, it will check for the latest memtable id again and use that to pick memtables to flush:

rocksdb/db/db_impl/db_impl_compaction_flush.cc

Line 182 in 390fc55

uint64_t max_memtable_id =

So I think even if we update this manual FlushRequest to use the latest memtable id, we will still have the optimization you mentioned.

If this manual flush's enqueuing effort didn't succeed because another auto flush request is already enqueued, since those request are generated with GenerateFlushRequest

rocksdb/db/db_impl/db_impl_compaction_flush.cc

Line 2227 in 390fc55

void DBImpl::GenerateFlushRequest(const autovector<ColumnFamilyData*>& cfds,

, that already enqueued request should have a memtable id that is equal to or smaller than the current latest memtable id. In theory, if the new cutoff timestamp is high enough to let the current latest memtable id proceed, that request can proceed too.

If the automatic flush fails, presumably that would trigger the error recovery flush, which goes through this manual flush path and enqueue another request. The RetryFlushesForErrorRecovery path does not go through this path, I think I should add invoking this callback in that path too.

It might be hard to enforce the memtable picking is non-greedy. Alternatively you could add a post-wait callback. Then the user can:

In the post-schedule, pre-wait callback:

IncreaseFullHistoryTsLow()

Bump an "in manual flush" counter

In the post-wait callback

Decrement an "in manual flush" counter

In the seal callback:

If the "in manual flush" counter is nonzero, call IncreaseFullHistoryTsLow()

Thanks for this idea that uses a combination of these callbacks. Let me do this in a follow up to implement such a flow as an example to handle this edge case.

ajkr

LGTM!

I realized the concern we're discussing is more about the callback usage than the callback itself (this PR). I think the callback itself looks good. If we can add stronger guarantees to make it more useful, that's even better.

ajkr · 2024-06-06T19:28:28Z

db/db_impl/db_impl_compaction_flush.cc

@@ -2426,6 +2443,8 @@ Status DBImpl::FlushMemTable(ColumnFamilyData* cfd,
      }
    }
  }
+
+  NotifyOnManualFlushScheduled({cfd}, flush_reason);


I think updating the memtable id in FlushRequest to be GetLatestMemTableID instead of std::numeric_limits<uint64_t>::max() can help make sure the flush can still proceed in this case. Do you have any concerns for making this change?

Sorry for the delay. It's hard to say. I think that bounding the flushed memtable ID was introduced for atomic_flush. We didn't use it everywhere because it's more efficient (for write-amp, at least) to greedily pick as many memtables as possible at flush-time. That can make a difference when the flush queue is long, which is rare so it's a minor optimization. Also, one could argue that foreground flush latency is more important than write-amp in case of manual flush. So, introducing the limit is fine with me.

Still, would it be enough? There could be a case where manual flush does not generate any flush request because there is already one queued for automatic flush. If that one fails or is postponed, do we add a new flush request with unbounded memtable ID?

facebook-github-bot · 2024-06-06T21:51:15Z

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-06-07T00:32:00Z

@jowlyzhang merged this pull request in 44aceb8.

Add newest user-defined timestamp to MemTableInfo

b7c41b0

facebook-github-bot added the CLA Signed label May 9, 2024

jowlyzhang marked this pull request as draft May 9, 2024 16:53

Add a OnManualFlushScheduled callback in event listener

dc5c54a

jowlyzhang changed the title ~~Add newest user-defined timestamp to MemTableInfo~~ Add a OnManualFlushScheduled callback in event listener May 9, 2024

jowlyzhang marked this pull request as ready for review May 9, 2024 20:23

jowlyzhang requested a review from ajkr May 9, 2024 20:23

ajkr reviewed May 10, 2024

View reviewed changes

jowlyzhang mentioned this pull request Jun 4, 2024

Change the behavior of manual flush to not retain UDT #12737

Open

ajkr approved these changes Jun 6, 2024

View reviewed changes

facebook-github-bot closed this in 44aceb8 Jun 7, 2024

facebook-github-bot added the Merged label Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a OnManualFlushScheduled callback in event listener #12631

Add a OnManualFlushScheduled callback in event listener #12631

jowlyzhang commented May 9, 2024 •

edited

ajkr May 10, 2024

jowlyzhang May 10, 2024

ajkr Jun 6, 2024 •

edited

jowlyzhang Jun 6, 2024

ajkr Jun 6, 2024

jowlyzhang Jun 6, 2024

ajkr left a comment

ajkr Jun 6, 2024 •

edited

facebook-github-bot commented Jun 6, 2024

facebook-github-bot commented Jun 7, 2024

Add a OnManualFlushScheduled callback in event listener #12631

Add a OnManualFlushScheduled callback in event listener #12631

Conversation

jowlyzhang commented May 9, 2024 • edited

ajkr May 10, 2024

Choose a reason for hiding this comment

jowlyzhang May 10, 2024

Choose a reason for hiding this comment

ajkr Jun 6, 2024 • edited

Choose a reason for hiding this comment

jowlyzhang Jun 6, 2024

Choose a reason for hiding this comment

ajkr Jun 6, 2024

Choose a reason for hiding this comment

jowlyzhang Jun 6, 2024

Choose a reason for hiding this comment

ajkr left a comment

Choose a reason for hiding this comment

ajkr Jun 6, 2024 • edited

Choose a reason for hiding this comment

facebook-github-bot commented Jun 6, 2024

facebook-github-bot commented Jun 7, 2024

jowlyzhang commented May 9, 2024 •

edited

ajkr Jun 6, 2024 •

edited

ajkr Jun 6, 2024 •

edited