Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keep track of unresolved value of symbolic-ref in ref iterators #1712

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

john-cai
Copy link
Contributor

@john-cai john-cai commented Apr 21, 2024

For reftable development, it's useful to be able to print out the direct value of a symbolic reference before resolution. This is currently possible with git-for-each-ref, but since the iterators do not keep track of the value of the symbolic ref, a separate call needs to be made each time making it inefficient.

Address this inefficiency by keeping track of the value of the symbolic reference in the ref iterator. This patch series also ends with a commit to use this value in the iterator through callbacks in ref-filter.c.

This series started with [1] but I decided to send a separate patch series since it is substantially different.

Benchmarking shows that with these changes, we experience a significant speedup in git-for-each-ref(1):

$ hyperfine --warmup 5 "git for-each-ref --format='%(refname) %(objectname) %(symref)'" "~/Projects/git/git for-each-ref --format='%(refname) %(objectname) %(symref)'"

Benchmark 1: git for-each-ref --format='%(refname) %(objectname) %(symref)'
Time (mean ± σ): 210.0 ms ± 9.1 ms [User: 5.8 ms, System: 11.8 ms]
Range (min … max): 203.4 ms … 228.8 ms 12 runs

Benchmark 2: ~/Projects/git/git for-each-ref --format='%(refname) %(objectname) %(symref)'
Time (mean ± σ): 7.4 ms ± 0.7 ms [User: 2.6 ms, System: 3.9 ms]
Range (min … max): 5.9 ms … 11.7 ms 273 runs

Summary
~/Projects/git/git for-each-ref --format='%(refname) %(objectname) %(symref)' ran
28.47 ± 2.86 times faster than git for-each-ref --format='%(refname) %(objectname) %(symref)'

  1. https://lore.kernel.org/git/pull.1684.git.git.1709592718743.gitgitgadget@gmail.com/

cc: Phillip Wood phillip.wood123@gmail.com
cc: "Kristoffer Haugsbakk" code@khaugsbakk.name
cc: Jeff King peff@peff.net
cc: Patrick Steinhardt ps@pks.im
cc: Jean-Noël Avila avila.jn@gmail.com

@john-cai john-cai force-pushed the jc/symbolic-ref-in-iterator branch from 0ce74be to 59acda5 Compare May 24, 2024 14:11
@john-cai john-cai force-pushed the jc/symbolic-ref-in-iterator branch 4 times, most recently from ea475ae to e573f45 Compare June 5, 2024 21:20
refs_resolve_ref_unsafe retrieves the referent, the unresolved value of
a reference. Add a parameter to allow refs_resolve_ref_unsafe to pass up
the value of referent to the caller so it can save this value in ref
iterators for more efficient access.

Signed-off-by: John Cai <johncai86@gmail.com>
@john-cai john-cai force-pushed the jc/symbolic-ref-in-iterator branch from e573f45 to 3f599da Compare June 6, 2024 12:53
@john-cai john-cai changed the title keep track of unresolved value of symbolic-ref in iterator keep track of unresolved value of symbolic-ref in ref iterators Jun 6, 2024
@john-cai john-cai force-pushed the jc/symbolic-ref-in-iterator branch from 3f599da to fe591ce Compare June 6, 2024 15:00
@john-cai
Copy link
Contributor Author

john-cai commented Jun 6, 2024

/preview

Copy link

Preview email sent as pull.1712.git.git.1717687732.gitgitgadget@gmail.com

Copy link

There are issues in commit 9a89508:
use const referent in ref cache
Commit checks stopped - the message is too short
Commit not signed off

Since ref iterators do not hold onto the direct value of a reference
without resolving it, the only way to get ahold of a direct value of a
symbolic ref is to make a separate call to refs_read_symbolic_ref.

To make accessing the direct value of a symbolic ref more efficient,
let's save the direct value of the ref in the iterators for both the
files backend and the reftable backend.

Signed-off-by: John Cai <johncai86@gmail.com>
Add a parameter to each_ref_fn so that callers to the ref APIs
that use this function as a callback can have acess to the
unresolved value of a symbolic ref.

Signed-off-by: John Cai <johncai86@gmail.com>
With a previous commit, the reference the symbolic ref points is saved
in the ref iterator records. Instead of making a separate call to
resolve_refdup() each time, we can just populate the ref_array_item with
the value from the iterator.

Signed-off-by: John Cai <johncai86@gmail.com>
@john-cai john-cai force-pushed the jc/symbolic-ref-in-iterator branch from 9a89508 to 6d5b1b6 Compare June 6, 2024 16:04
@john-cai
Copy link
Contributor Author

john-cai commented Jun 6, 2024

/submit

Copy link

Submitted as pull.1712.git.git.1717694800.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-git-1712/john-cai/jc/symbolic-ref-in-iterator-v1

To fetch this version to local tag pr-git-1712/john-cai/jc/symbolic-ref-in-iterator-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-git-1712/john-cai/jc/symbolic-ref-in-iterator-v1

@@ -533,7 +533,7 @@ static int get_modified_files(struct repository *r,
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

ADMINISTRIVIA.  Check the address you place on the CC: line.  What
we can see for this message at

https://lore.kernel.org/git/011c10f488610b0a795a843bff66723477783761.1717694801.git.gitgitgadget@gmail.com/raw

looks like this.

    Cc: "Phillip Wood [ ]" <phillip.wood123@gmail.com>,
        Kristoffer Haugsbakk <[code@khaugsbakk.name]>,
        "Jeff King [ ]" <peff@peff.net>,
        "Patrick Steinhardt [ ]" <ps@pks.im>,
        "=?UTF-8?Q?Jean-No=C3=ABl?= Avila [ ]" <avila.jn@gmail.com>,
        John Cai <johncai86@gmail.com>,
        John Cai <johncai86@gmail.com>

I fixed them manually, but it wasn't pleasant.  I think we saw a
similar breakage earlier coming via GGG, but I do not recall the
details of how to cause such breakages (iow, what to avoid repeating
this).

Anyway.

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  29 files changed, 64 insertions(+), 52 deletions(-)

Wow, the blast radius of this thing is rather big.  Among these
existing callers of refs_resolve_ref_unsafe(), how many of them
will benefit from being able to pass a non NULL parameter at the end
of the series, and more importantly, in the future to take advantage
of the new feature possibly with a separate series?

I am assuming that this will benefit only a selected few and the
callers that would want to take advantage of the new feature will
remain low.  Have you considered renaming refs_resolve_ref_unsafe()
to a new name (say, refs_resolve_ref_unsafe_with_referent()) and
implement the new feature (which is only triggered when the new
parameter gets a non NULL value), make refs_resolve_ref_unsafe() a
very thin wrapper that passes NULL to the new thing?

That way, you do not have to touch those existing callers that will
never benefit from the new capability in the future.  You won't risk
conflicting with in flight topics semantically, either.

Or will they also benefit from the new feature in the future?

Offhand, I do not know how a caller like this ...

> diff --git a/add-interactive.c b/add-interactive.c
> index b5d6cd689a1..041d30cf2b3 100644
> --- a/add-interactive.c
> +++ b/add-interactive.c
> @@ -533,7 +533,7 @@ static int get_modified_files(struct repository *r,
>  {
>  	struct object_id head_oid;
>  	int is_initial = !refs_resolve_ref_unsafe(get_main_ref_store(the_repository),
> -						  "HEAD", RESOLVE_REF_READING,
> +						  "HEAD", NULL, RESOLVE_REF_READING,
>  						  &head_oid, NULL);
>  	struct collection_status s = { 0 };
>  	int i;

... would be helped.

Thanks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Junio C Hamano <gitster@pobox.com> writes:

> "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>>  29 files changed, 64 insertions(+), 52 deletions(-)
>
> Wow, the blast radius of this thing is rather big.  Among these
> existing callers of refs_resolve_ref_unsafe(), how many of them
> will benefit from being able to pass a non NULL parameter at the end
> of the series, and more importantly, in the future to take advantage
> of the new feature possibly with a separate series?
> ...
> That way, you do not have to touch those existing callers that will
> never benefit from the new capability in the future.  You won't risk
> conflicting with in flight topics semantically, either.

The same comment applies to [3/4], but I do not want to fix the Cc: header
again, so I am replying to this message.

Thanks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, John Cai wrote (reply to this):

Hi Junio,

On 6 Jun 2024, at 14:21, Junio C Hamano wrote:

> ADMINISTRIVIA.  Check the address you place on the CC: line.  What
> we can see for this message at
>
> https://lore.kernel.org/git/011c10f488610b0a795a843bff66723477783761.1717694801.git.gitgitgadget@gmail.com/raw
>
> looks like this.
>
>     Cc: "Phillip Wood [ ]" <phillip.wood123@gmail.com>,
>         Kristoffer Haugsbakk <[code@khaugsbakk.name]>,
>         "Jeff King [ ]" <peff@peff.net>,
>         "Patrick Steinhardt [ ]" <ps@pks.im>,
>         "=?UTF-8?Q?Jean-No=C3=ABl?= Avila [ ]" <avila.jn@gmail.com>,
>         John Cai <johncai86@gmail.com>,
>         John Cai <johncai86@gmail.com>
>
> I fixed them manually, but it wasn't pleasant.  I think we saw a
> similar breakage earlier coming via GGG, but I do not recall the
> details of how to cause such breakages (iow, what to avoid repeating
> this).

oof, apologies. Didn't notice that. I'll be more mindful about the cc line.

>
> Anyway.
>
> "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>>  29 files changed, 64 insertions(+), 52 deletions(-)
>
> Wow, the blast radius of this thing is rather big.  Among these
> existing callers of refs_resolve_ref_unsafe(), how many of them
> will benefit from being able to pass a non NULL parameter at the end
> of the series, and more importantly, in the future to take advantage
> of the new feature possibly with a separate series?
>
> I am assuming that this will benefit only a selected few and the
> callers that would want to take advantage of the new feature will
> remain low.  Have you considered renaming refs_resolve_ref_unsafe()
> to a new name (say, refs_resolve_ref_unsafe_with_referent()) and
> implement the new feature (which is only triggered when the new
> parameter gets a non NULL value), make refs_resolve_ref_unsafe() a
> very thin wrapper that passes NULL to the new thing?
>
> That way, you do not have to touch those existing callers that will
> never benefit from the new capability in the future.  You won't risk
> conflicting with in flight topics semantically, either.
>
> Or will they also benefit from the new feature in the future?
>
> Offhand, I do not know how a caller like this ...
>
>> diff --git a/add-interactive.c b/add-interactive.c
>> index b5d6cd689a1..041d30cf2b3 100644
>> --- a/add-interactive.c
>> +++ b/add-interactive.c
>> @@ -533,7 +533,7 @@ static int get_modified_files(struct repository *r,
>>  {
>>  	struct object_id head_oid;
>>  	int is_initial = !refs_resolve_ref_unsafe(get_main_ref_store(the_repository),
>> -						  "HEAD", RESOLVE_REF_READING,
>> +						  "HEAD", NULL, RESOLVE_REF_READING,
>>  						  &head_oid, NULL);
>>  	struct collection_status s = { 0 };
>>  	int i;
>
> ... would be helped.

Good point. I was sensing the same thing as I made all the callsite changes so
this feedback makes it clear what we should do.

>
> Thanks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

John Cai <johncai86@gmail.com> writes:

> On 6 Jun 2024, at 14:21, Junio C Hamano wrote:
>
>> ADMINISTRIVIA.  Check the address you place on the CC: line.  What
>> we can see for this message at
>> ...
>> I fixed them manually, but it wasn't pleasant.  I think we saw a
>> similar breakage earlier coming via GGG, but I do not recall the
>> details of how to cause such breakages (iow, what to avoid repeating
>> this).
>
> oof, apologies. Didn't notice that. I'll be more mindful about the cc line.

I found the previous occurrences of the same problem:

  https://lore.kernel.org/git/xmqqjzm3qumx.fsf@gitster.g/
  https://lore.kernel.org/git/xmqqh6hkxox6.fsf@gitster.g/

The last message in the thread

  https://lore.kernel.org/git/CAMbn=B7J4ODf9ybJQpL1bZZ7qdWSDGaLEyTmVv+ZBiSeC9T+yw@mail.gmail.com/

says that the original user of GGG found what was wrong in the way
the user was using GGG to send and fixed it, but unfortunately we
didn't hear exactly *what* the breakage was and *how* it was fixed.

Aryan, do you remember what the problem was and more importantly
what the fix was?

Thanks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, John Cai wrote (reply to this):

Hi Junio,

On 6 Jun 2024, at 14:23, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>>  29 files changed, 64 insertions(+), 52 deletions(-)
>>
>> Wow, the blast radius of this thing is rather big.  Among these
>> existing callers of refs_resolve_ref_unsafe(), how many of them
>> will benefit from being able to pass a non NULL parameter at the end
>> of the series, and more importantly, in the future to take advantage
>> of the new feature possibly with a separate series?
>> ...
>> That way, you do not have to touch those existing callers that will
>> never benefit from the new capability in the future.  You won't risk
>> conflicting with in flight topics semantically, either.
>
> The same comment applies to [3/4], but I do not want to fix the Cc: header
> again, so I am replying to this message.

Yes, so for 3/4 I was exploring doing the same thing. However, each_repo_fn goes
pretty deep in the callstack and to provide an alternate set of functions that
use something like each_repo_referent_fn would still lead to a relatively large
blast radius, eg, something like:


diff --git a/ref-filter.c b/ref-filter.c
index f7fb0c7e0e..770d3715c2 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2631,12 +2631,12 @@ static int filter_exclude_match(struct ref_filter *filter, const char *refname)
  * pattern match, so the callback still has to match each ref individually.
  */
 static int for_each_fullref_in_pattern(struct ref_filter *filter,
-				       each_ref_fn cb,
+				       each_ref_with_referent_fn cb,
 				       void *cb_data)
 {
 	if (filter->kind & FILTER_REFS_ROOT_REFS) {
 		/* In this case, we want to print all refs including root refs. */
-		return refs_for_each_include_root_refs(get_main_ref_store(the_repository),
+		return refs_for_each_include_root_refs_with_referent(get_main_ref_store(the_repository),
 						       cb, cb_data);
 	}

@@ -2646,7 +2646,7 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 		 * prefixes like "refs/heads/" etc. are stripped off,
 		 * so we have to look at everything:
 		 */
-		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
+		return refs_for_each_fullref_in_with_referent(get_main_ref_store(the_repository),
 						"", NULL, cb, cb_data);
 	}

@@ -2656,17 +2656,17 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 		 * so just return everything and let the caller
 		 * sort it out.
 		 */
-		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
+		return refs_for_each_fullref_in_with_referent(get_main_ref_store(the_repository),
 						"", NULL, cb, cb_data);
 	}

 	if (!filter->name_patterns[0]) {
 		/* no patterns; we have to look at everything */
-		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
+		return refs_for_each_fullref_in_with_referent(get_main_ref_store(the_repository),
 						 "", filter->exclude.v, cb, cb_data);
 	}

-	return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository),
+	return refs_for_each_fullref_in_prefixes_with_referent(get_main_ref_store(the_repository),
 						 NULL, filter->name_patterns,
 						 filter->exclude.v,
 						 cb, cb_data);
@@ -2781,7 +2781,7 @@ static int filter_ref_kind(struct ref_filter *filter, const char *refname)
 	return ref_kind_from_refname(refname);
 }

-static struct ref_array_item *apply_ref_filter(const char *refname, const struct object_id *oid,
+static struct ref_array_item *apply_ref_filter(const char *refname, const char *referent, const struct object_id *oid,
 			    int flag, struct ref_filter *filter)
 {
 	struct ref_array_item *ref;
@@ -2850,6 +2850,7 @@ static struct ref_array_item *apply_ref_filter(const char *refname, const struct
 	ref->commit = commit;
 	ref->flag = flag;
 	ref->kind = kind;
+	ref->symref = referent;

 	return ref;
 }
@@ -2863,12 +2864,12 @@ struct ref_filter_cbdata {
  * A call-back given to for_each_ref().  Filter refs and keep them for
  * later object processing.
  */
-static int filter_one(const char *refname, const struct object_id *oid, int flag, void *cb_data)
+static int filter_one(const char *refname, const char *referent, const struct object_id *oid, int flag, void *cb_data)
 {
 	struct ref_filter_cbdata *ref_cbdata = cb_data;
 	struct ref_array_item *ref;

-	ref = apply_ref_filter(refname, oid, flag, ref_cbdata->filter);
+	ref = apply_ref_filter(refname, referent, oid, flag, ref_cbdata->filter);
 	if (ref)
 		ref_array_append(ref_cbdata->array, ref);

@@ -2898,13 +2899,13 @@ struct ref_filter_and_format_cbdata {
 	} internal;
 };

-static int filter_and_format_one(const char *refname, const struct object_id *oid, int flag, void *cb_data)
+static int filter_and_format_one(const char *refname, const char *referent, const struct object_id *oid, int flag, void *cb_data)
 {
 	struct ref_filter_and_format_cbdata *ref_cbdata = cb_data;
 	struct ref_array_item *ref;
 	struct strbuf output = STRBUF_INIT, err = STRBUF_INIT;

-	ref = apply_ref_filter(refname, oid, flag, ref_cbdata->filter);
+	ref = apply_ref_filter(refname, referent, oid, flag, ref_cbdata->filter);
 	if (!ref)
 		return 0;

@@ -3050,7 +3051,7 @@ void filter_ahead_behind(struct repository *r,
 	free(commits);
 }

-static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref_fn fn, void *cb_data)
+static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref_with_referent_fn fn, void *cb_data)
 {
 	int ret = 0;

@@ -3070,15 +3071,15 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref
 		 * of filter_ref_kind().
 		 */
 		if (filter->kind == FILTER_REFS_BRANCHES)
-			ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
+			ret = refs_for_each_fullref_in_with_referent(get_main_ref_store(the_repository),
 						       "refs/heads/", NULL,
 						       fn, cb_data);
 		else if (filter->kind == FILTER_REFS_REMOTES)
-			ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
+			ret = refs_for_each_fullref_in_with_referent(get_main_ref_store(the_repository),
 						       "refs/remotes/", NULL,
 						       fn, cb_data);
 		else if (filter->kind == FILTER_REFS_TAGS)
-			ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
+			ret = refs_for_each_fullref_in_with_referent(get_main_ref_store(the_repository),
 						       "refs/tags/", NULL, fn,
 						       cb_data);
 		else if (filter->kind & FILTER_REFS_REGULAR)
@@ -3090,7 +3091,7 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref
 		 */
 		if (!ret && !(filter->kind & FILTER_REFS_ROOT_REFS) &&
 		    (filter->kind & FILTER_REFS_DETACHED_HEAD))
-			refs_head_ref(get_main_ref_store(the_repository), fn,
+			refs_head_ref_referent(get_main_ref_store(the_repository), fn,
 				      cb_data);
 	}

diff --git a/refs.c b/refs.c
index 6dcb0288cb..4366f35586 100644
--- a/refs.c
+++ b/refs.c
@@ -1529,6 +1529,19 @@ int refs_head_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 	return 0;
 }

+int refs_head_ref_referent(struct ref_store *refs, each_ref_with_referent_fn fn, void *cb_data)
+{
+	struct object_id oid;
+	int flag;
+	const char *referent;
+
+	if (refs_resolve_ref_unsafe_with_referent(refs, "HEAD", referent, RESOLVE_REF_READING,
+				    &oid, &flag))
+		return fn("HEAD", referent, &oid, flag, cb_data);
+
+	return 0;
+}
+
 struct ref_iterator *refs_ref_iterator_begin(
 		struct ref_store *refs,
 		const char *prefix,
@@ -1560,6 +1573,22 @@ struct ref_iterator *refs_ref_iterator_begin(
 	return iter;
 }

+static int do_for_each_ref_with_referent(struct ref_store *refs, const char *prefix,
+			   const char **exclude_patterns,
+			   each_ref_with_referent_fn fn, int trim,
+			   enum do_for_each_ref_flags flags, void *cb_data)
+{
+	struct ref_iterator *iter;
+
+	if (!refs)
+		return 0;
+
+	iter = refs_ref_iterator_begin(refs, prefix, exclude_patterns, trim,
+				       flags);
+
+	return do_for_each_ref_iterator_with_referent(iter, fn, cb_data);
+}
+
 static int do_for_each_ref(struct ref_store *refs, const char *prefix,
 			   const char **exclude_patterns,
 			   each_ref_fn fn, int trim,
@@ -1594,6 +1623,13 @@ int refs_for_each_fullref_in(struct ref_store *refs, const char *prefix,
 	return do_for_each_ref(refs, prefix, exclude_patterns, fn, 0, 0, cb_data);
 }

+int refs_for_each_fullref_in_with_referent(struct ref_store *refs, const char *prefix,
+			     const char **exclude_patterns,
+			     each_ref_with_referent_fn fn, void *cb_data)
+{
+	return do_for_each_ref_with_referent(refs, prefix, exclude_patterns, fn, 0, 0, cb_data);
+}
+
 int refs_for_each_replace_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
 	const char *git_replace_ref_base = ref_namespace[NAMESPACE_REPLACE].ref;
@@ -1627,6 +1663,13 @@ int refs_for_each_include_root_refs(struct ref_store *refs, each_ref_fn fn,
 			       DO_FOR_EACH_INCLUDE_ROOT_REFS, cb_data);
 }

+int refs_for_each_include_root_refs_with_referent(struct ref_store *refs, each_ref_with_referent_fn fn,
+				    void *cb_data)
+{
+	return do_for_each_ref_with_referent(refs, "", NULL, fn, 0,
+			       DO_FOR_EACH_INCLUDE_ROOT_REFS, cb_data);
+}
+
 static int qsort_strcmp(const void *va, const void *vb)
 {
 	const char *a = *(const char **)va;
@@ -1716,6 +1759,37 @@ int refs_for_each_fullref_in_prefixes(struct ref_store *ref_store,
 	return ret;
 }

+int refs_for_each_fullref_in_prefixes_with_referent(struct ref_store *ref_store,
+				      const char *namespace,
+				      const char **patterns,
+				      const char **exclude_patterns,
+				      each_ref_with_referent_fn fn, void *cb_data)
+{
+	struct string_list prefixes = STRING_LIST_INIT_DUP;
+	struct string_list_item *prefix;
+	struct strbuf buf = STRBUF_INIT;
+	int ret = 0, namespace_len;
+
+	find_longest_prefixes(&prefixes, patterns);
+
+	if (namespace)
+		strbuf_addstr(&buf, namespace);
+	namespace_len = buf.len;
+
+	for_each_string_list_item(prefix, &prefixes) {
+		strbuf_addstr(&buf, prefix->string);
+		ret = refs_for_each_fullref_in_with_referent(ref_store, buf.buf,
+					       exclude_patterns, fn, cb_data);
+		if (ret)
+			break;
+		strbuf_setlen(&buf, namespace_len);
+	}
+
+	string_list_clear(&prefixes, 0);
+	strbuf_release(&buf);
+	return ret;
+}
+
 static int refs_read_special_head(struct ref_store *ref_store,
 				  const char *refname, struct object_id *oid,
 				  struct strbuf *referent, unsigned int *type,
diff --git a/refs.h b/refs.h
index ba6d0e0753..d8387c6296 100644
--- a/refs.h
+++ b/refs.h
@@ -304,6 +304,9 @@ struct ref_transaction;
 typedef int each_ref_fn(const char *refname,
 			const struct object_id *oid, int flags, void *cb_data);

+typedef int each_ref_with_referent_fn(const char *refname, const char *referent,
+			const struct object_id *oid, int flags, void *cb_data);
+
 /*
  * The following functions invoke the specified callback function for
  * each reference indicated.  If the function ever returns a nonzero
@@ -315,6 +318,8 @@ typedef int each_ref_fn(const char *refname,
  */
 int refs_head_ref(struct ref_store *refs,
 		  each_ref_fn fn, void *cb_data);
+int refs_head_ref_referent(struct ref_store *refs,
+		  each_ref_with_referent_fn fn, void *cb_data);
 int refs_for_each_ref(struct ref_store *refs,
 		      each_ref_fn fn, void *cb_data);
 int refs_for_each_ref_in(struct ref_store *refs, const char *prefix,
@@ -351,6 +356,17 @@ int refs_for_each_fullref_in_prefixes(struct ref_store *refs,
 				      const char **exclude_patterns,
 				      each_ref_fn fn, void *cb_data);

+int refs_for_each_fullref_in_prefixes_with_referent(struct ref_store *refs,
+				      const char *namespace,
+				      const char **patterns,
+				      const char **exclude_patterns,
+				      each_ref_with_referent_fn fn, void *cb_data);
+
+int refs_for_each_fullref_in_with_referent(struct ref_store *refs, const char *prefix,
+			     const char **exclude_patterns,
+			     each_ref_with_referent_fn fn, void *cb_data);
+
+
 /* iterates all refs that match the specified glob pattern. */
 int refs_for_each_glob_ref(struct ref_store *refs, each_ref_fn fn,
 			   const char *pattern, void *cb_data);
@@ -377,6 +393,10 @@ int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data);
 int refs_for_each_include_root_refs(struct ref_store *refs, each_ref_fn fn,
 				    void *cb_data);

+int refs_for_each_include_root_refs_with_referent(struct ref_store *refs, each_ref_with_referent_fn fn,
+				    void *cb_data);
+
+
 /*
  * Normalizes partial refs to their fully qualified form.
  * Will prepend <prefix> to the <pattern> if it doesn't start with 'refs/'.
diff --git a/refs/iterator.c b/refs/iterator.c
index 26acb6bd64..26c38ec6de 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -469,3 +469,30 @@ int do_for_each_ref_iterator(struct ref_iterator *iter,
 		return -1;
 	return retval;
 }
+
+int do_for_each_ref_iterator_with_referent(struct ref_iterator *iter,
+			     each_ref_with_referent_fn fn, void *cb_data)
+{
+	int retval = 0, ok;
+	struct ref_iterator *old_ref_iter = current_ref_iter;
+
+	current_ref_iter = iter;
+	while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
+		retval = fn(iter->refname, iter->referent, iter->oid, iter->flags, cb_data);
+		if (retval) {
+			/*
+			 * If ref_iterator_abort() returns ITER_ERROR,
+			 * we ignore that error in deference to the
+			 * callback function's return value.
+			 */
+			ref_iterator_abort(iter);
+			goto out;
+		}
+	}
+
+out:
+	current_ref_iter = old_ref_iter;
+	if (ok == ITER_ERROR)
+		return -1;
+	return retval;
+}
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 0775451435..ebec407468 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -486,6 +486,9 @@ extern struct ref_iterator *current_ref_iter;
  */
 int do_for_each_ref_iterator(struct ref_iterator *iter,
 			     each_ref_fn fn, void *cb_data);
+int do_for_each_ref_iterator_with_referent(struct ref_iterator *iter,
+			     each_ref_with_referent_fn fn, void *cb_data);
+

 struct ref_store;


Which is a little bit unseemly in my view...so between the two options I would be
inclined to go with modifying each_repo_fn than create a whole subset set of
helpers. wdyt?

thanks
John

>
> Thanks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

John Cai <johncai86@gmail.com> writes:

> Yes, so for 3/4 I was exploring doing the same thing. However, each_repo_fn goes
> pretty deep in the callstack and to provide an alternate set of functions that
> use something like each_repo_referent_fn would still lead to a relatively large
> blast radius, eg, something like:

Hmph.  You only care about teaching referent to calls to
refs_for_each_ref_in() and friends in apply_ref_filter()
and nowhere else.  So for example, to preserve the current
calling pattern for this pair of caller/callback (and you have
dozens more exactly like this one you touch in [3/4]):

	static int register_ref(const char *refname,
                                const struct object_id *oid,
				int flags,
				void *cb_data);

	static int read_bisect_refs(void)
	{
		return refs_for_each_ref_in(get_main_ref_store(the_repository),
					    "refs/bisect", register_ref, NULL);
	}

you could arrange your _new_ API the way you want, e.g.,

	typedef int each_ref_with_referent_fn(const char *refname,
	                                      const char *referent,
                                              const struct object_id *oid,
					      int flags,
					      void *cb_data);

	int refs_for_each_ref_with_referent_in(struct ref_store *refs,
					       const char *prefix,
                                               each_ref_with_referent_fn fn,
					       void *cb_data);

to help the single caller, i.e., apply_ref_filter(), and rebuild the
existing API on top as a thin wrapper, e.g.

	/* This cannot change without butchering existing callers */
	typedef int each_ref_fn(const char *refname,
                                const struct object_id *oid,
				int flags,
				void *cb_data);

	/* Hence we introduce an adapter */
	struct each_ref_fn_adapter_cbdata {
		each_ref_fn user_each_ref_fn;
		void *user_cb_data;
	};

	/* This is designed to work as an each_ref_with_referent_fn */
	static int each_ref_adapter_fn(const char *refname,
				       const char *referent,
                                       const struct object_id *oid,
				       int flags,
				       void *cb_data)
	{
		struct each_ref_fn_adapter_cbdata *adapter_cbdata = cbdata;

		/* the callers have no need for referent */
                return adapter_cbdata->user_each_ref_fn(refname, oid, flags,
					                adapter_cbdata->user_cbdata);
	}

	/*
         * The function signature must stay the same to help existing,
         * callers, but the implementation is now a thin wrapper.
	 */
	int refs_for_each_ref_in(struct ref_store *refs,
				 const char *prefix,
                                 each_ref_fn fn,
				 void *cb_data)
        {
		struct each_ref_fn_adapter_cbdata adapter_cbdata = {
                	.user_each_ref_fn = fn,
			.user_cb_data = cb_data,
		};
		return refs_for_each_ref_with_referetnt_in(refs, prefix,
							   each_ref_adapter_fn,
                                                           &adapter_cbdata);
	}

no?

You'd need to pass through the new parameter "referent" through the
code paths that implement refs_for_each_ref_in() and friends to
update them to refs_for_each_ref_with_referent_in() and friends no
matter what, but there are limited number of the top-level
refs_for_each_ref_in() and friends that are unaware of the
"referent", and each of them would need the above ~20 lines (couting
the comment) adapter function that all can share the single
each_ref_adapter_fn() callback function.

Or am I missing some other intricacy in the existing for-each-* API?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Patrick Steinhardt wrote (reply to this):

On Thu, Jun 06, 2024 at 11:23:18AM -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >
> >>  29 files changed, 64 insertions(+), 52 deletions(-)
> >
> > Wow, the blast radius of this thing is rather big.  Among these
> > existing callers of refs_resolve_ref_unsafe(), how many of them
> > will benefit from being able to pass a non NULL parameter at the end
> > of the series, and more importantly, in the future to take advantage
> > of the new feature possibly with a separate series?
> > ...
> > That way, you do not have to touch those existing callers that will
> > never benefit from the new capability in the future.  You won't risk
> > conflicting with in flight topics semantically, either.
> 
> The same comment applies to [3/4], but I do not want to fix the Cc: header
> again, so I am replying to this message.

I wonder whether we can future-proof the code a bit more by introducing
a struct that contains everything we want to pass to the callback
function. That wouldn't fix the blast radius of this patch, but would
mean that we can easily add additional fielids to that struct in the
future without having to adapt any callers at all anymore.

Something like:

    struct ref_data {
        const char *refname;
        const struct object_id *oid;
        const char *referent;
        int flags;
    };

This would also allow us to get rid of this awful `peel_iterated_oid()`
function that relies on global state in many places, as we can put the
peeled OID into that structure, as well.

Patrick

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant