Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid unnecessary allocations while finding token matches in a file #73500

Closed
wants to merge 3 commits into from

Conversation

CyrusNajmabadi
Copy link
Member

Saw these allocs while doing a trace that include FAR in it. We have a fast path that says "the bloom filter found a hit in this file, and we know the identifier was not escaped in it". IN that case, we do a textual search to find spans to get as tokens, so we don't have to walk the entire tree looking for the matches (we can instead dive down right to that span, only realizing the red nodes along that path).

However, the finding of text locations was unnecessarily allocating for each match it was looking for.

@CyrusNajmabadi CyrusNajmabadi requested a review from a team as a code owner May 16, 2024 00:14
@dotnet-issue-labeler dotnet-issue-labeler bot added Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead labels May 16, 2024
for (var i = startIndex; i <= length; i++)
return caseSensitive
? IndexOfCaseSensitive()
: IndexOfCaseInsensitive();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i found this cleaner as just two separate find helpers. one for the common C# case, which needs no converting of chars, and much less branching, and one for VB.

Both no longer alloc. Only the VB one has some special complex logic around case insensitivity.

var match = true;
for (var j = 0; j < searchStringLength; j++)
{
var matchChar = j == 0 ? normalizedFirstChar : CaseInsensitiveComparison.ToLower(searchString[j]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CaseInsensitiveComparison.ToLower

I'm not sure of the search characteristics, but there is a potential tradeoff that we could be calling this ToLower(char) for quite a few more chars than in searchString, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. but i didn't measure any problems with this. and i view allocatoins as much worse. most code is ascii, so we're going to fastpath all these ToLowers all the time.

@@ -70,35 +70,64 @@ public static TextChangeRange GetEncompassingTextChangeRange(this SourceText new
return TextChangeRange.Collapse(ranges);
}

public static int IndexOf(this SourceText text, string value, int startIndex, bool caseSensitive)
public static int IndexOf(this SourceText text, string searchString, int startIndex, bool caseSensitive)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IndexOf

BTW, I wonder if something similar to what was done in https://devdiv.visualstudio.com/DevDiv/_git/VSUnitTesting/pullrequest/550572 might be useful in source text searching.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i leave to you to implement :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slacker! :) Did this method show up at all in the CPU side of your profile?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check again. I think it's dominated by compiler time. Will do tomorrow!

Copy link
Contributor

@ToddGrun ToddGrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

//
// only one implementation we have that could have bad indexer perf is CompositeText with heavily modified text
// at compiler layer but I believe that being used in find all reference will be very rare if not none.
if (!Match(normalized[j], text[i + j], caseSensitive))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (!Match(normalized[j], text[i + j], caseSensitive))

Already approved, but dumb question I just thought of:

Why not just use CaseInsensitiveComoparison.Equals passing in both as ReadOnlySpan (nothing normalized)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So once it's not spanable. It's a source text. :-(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, thanks!

@CyrusNajmabadi
Copy link
Member Author

closing out. i haven't been able to see this again.

@CyrusNajmabadi CyrusNajmabadi deleted the farAllocs branch May 22, 2024 00:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants