-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client-go: data consistency checker for list requests #124963
client-go: data consistency checker for list requests #124963
Conversation
/assign @wojtek-t |
// | ||
// if ResourceVersion = "" and ConsistendListFromCache is disabled or RequestWatchProgress isn't supported, | ||
// then the request will be served from the storage. | ||
func wasListRequestServedFromStorage(opts metav1.ListOptions) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copied from the cacher.go - it looks like it does what we want expect the case mentioned in the comment and after clarifying rows 1 and 3 from the KEP.
// | ||
// Note that this function will panic when data inconsistency is detected. | ||
// This is intentional because we want to catch it in the CI. | ||
func CheckListAgainstCacheDataConsistencyIfRequested[T runtime.Object](ctx context.Context, identity string, listItemsFn listItemsFunc[T], optionsUsedToReceiveList metav1.ListOptions, receivedList runtime.Object) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to find a better home for this function. Depending on where it is defined and whether it will be public, we could consider adding more validation for opts (e.g., checking if it was actually a list request).
@@ -64,7 +115,7 @@ func checkWatchListDataConsistencyIfRequested(ctx context.Context, identity stri | |||
// it is guarded by an environmental variable. | |||
// we cannot manipulate the environmental variable because | |||
// it will affect other tests in this package. | |||
func checkDataConsistency(ctx context.Context, identity string, lastSyncedResourceVersion string, listItemsFn listItemsFunc, retrieveCollectedItemsFn retrieveCollectedItemsFunc) { | |||
func checkDataConsistency[T runtime.Object, U any](ctx context.Context, identity string, lastSyncedResourceVersion string, listItemsFn listItemsFunc[T], retrieveCollectedItemsFn retrieveCollectedItemsFunc[U]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to find a better home for this function. This function is common for checkWatchListDataConsistencyIfRequested
and CheckListAgainstCacheDataConsistencyIfRequested
.
6fe1b36
to
6583cc6
Compare
@@ -461,6 +462,11 @@ func new$.type|publicPlural$(c *$.GroupGoName$$.Version$Client) *$.type|privateP | |||
var listTemplate = ` | |||
// List takes label and field selectors, and returns the list of $.resultType|publicPlural$ that match those selectors. | |||
func (c *$.type|privatePlural$) List(ctx context.Context, opts $.ListOptions|raw$) (result *$.resultType|raw$List, err error) { | |||
defer func() { | |||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really what you want? We rather want to call it if error was non-nil, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ups, yeah, it should have been if err == nil
- thanks.
@@ -697,7 +697,7 @@ func (r *Reflector) watchList(stopCh <-chan struct{}) (watch.Interface, error) { | |||
// we utilize the temporaryStore to ensure independence from the current store implementation. | |||
// as of today, the store is implemented as a queue and will be drained by the higher-level | |||
// component as soon as it finishes replacing the content. | |||
checkWatchListConsistencyIfRequested(stopCh, r.name, resourceVersion, r.listerWatcher, temporaryStore) | |||
checkWatchListDataConsistencyIfRequested(wait.ContextForChannel(stopCh), fmt.Sprintf("watch-list reflector with name: %q", r.name), resourceVersion, wrapListFuncWithContext(r.listerWatcher.List), temporaryStore.List) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, seeing how you want to use it now and where are the problems, let's first merge this commit as a #124446
staging/src/k8s.io/code-generator/cmd/client-gen/generators/generator_for_type.go
Show resolved
Hide resolved
checkDataConsistency(ctx, identity, lastSyncedResourceVersion, listItemsFn, func() []runtime.Object { return rawListItems }) | ||
} | ||
|
||
// wasListRequestServedFromStorage based on the passed ListOptions determines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should rather evolve it to something like:
"if continuation wasn't set, the call LIST with RV= and ResourceVersionMatch=Exact and rest parameters the same"
Seems simpler, potentially makes this check unnecessarily, but it's not a big deal.
6583cc6
to
f7457cb
Compare
/retest |
/test pull-kubernetes-node-e2e-containerd |
57c4f77
to
472043e
Compare
472043e
to
42da3cf
Compare
I approve of the approach, though I suspect further refactoring will be necessary for streaming list. |
thanks, yes, support for streaming list will require more changes to the code generation. |
c7d63a2
to
06b7d97
Compare
OK, so this PR has a commit that always enables the detector for list requests. Here is the summary of test failures. I think that we could initially enable the detector for an e2e job like Unit Tests (3 failures)
Integration Tests (2 failures)
|
06b7d97
to
448180d
Compare
/triage accepted |
Yes - the most important thing is that we understand the failures in unit/integration tests and those are test issues, not real issues. So that's fine. /lgtm |
LGTM label has been added. Git tree hash: dbdb31c5b4c899b9617fe73668262c8f5c915e79
|
/kind feature |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: p0lyn0mial, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Adds data consistency checker to client-go for checking data retrieved via list requests from the cache and directly from etcd. The detector can be enabled by setting
KUBE_LIST_FROM_CACHE_INCONSISTENCY_DETECTOR
env var.// CheckListFromCacheDataConsistencyIfRequested performs a data consistency check only when
// the KUBE_LIST_FROM_CACHE_INCONSISTENCY_DETECTOR environment variable was set during a binary startup
// for requests that have a high chance of being served from the watch-cache.
//
// The consistency check is meant to be enforced only in the CI, not in production.
// The check ensures that data retrieved by a list api call from the watch-cache
// is exactly the same as data received by the list api call from etcd.
//
// Note that this function will panic when data inconsistency is detected.
// This is intentional because we want to catch it in the CI.
//
// Note that this function doesn't examine the ListOptions to determine
// if the original request has hit the cache because it would be challenging
// to maintain consistency with the server-side implementation.
// For simplicity, we assume that the first request retrieved data from
// the cache (even though this might not be true for some requests)
// and issue the second call to get data from etcd for comparison.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: