Avoid unnecessary allocations while finding token matches in a file #73500

CyrusNajmabadi · 2024-05-16T00:14:22Z

Saw these allocs while doing a trace that include FAR in it. We have a fast path that says "the bloom filter found a hit in this file, and we know the identifier was not escaped in it". IN that case, we do a textual search to find spans to get as tokens, so we don't have to walk the entire tree looking for the matches (we can instead dive down right to that span, only realizing the red nodes along that path).

However, the finding of text locations was unnecessarily allocating for each match it was looking for.

CyrusNajmabadi · 2024-05-16T00:15:11Z

src/Workspaces/Core/Portable/Shared/Extensions/SourceTextExtensions.cs

-        for (var i = startIndex; i <= length; i++)
+        return caseSensitive
+            ? IndexOfCaseSensitive()
+            : IndexOfCaseInsensitive();


i found this cleaner as just two separate find helpers. one for the common C# case, which needs no converting of chars, and much less branching, and one for VB.

Both no longer alloc. Only the VB one has some special complex logic around case insensitivity.

ToddGrun · 2024-05-16T00:59:27Z

src/Workspaces/Core/Portable/Shared/Extensions/SourceTextExtensions.cs

+                var match = true;
+                for (var j = 0; j < searchStringLength; j++)
+                {
+                    var matchChar = j == 0 ? normalizedFirstChar : CaseInsensitiveComparison.ToLower(searchString[j]);


CaseInsensitiveComparison.ToLower

I'm not sure of the search characteristics, but there is a potential tradeoff that we could be calling this ToLower(char) for quite a few more chars than in searchString, right?

yes. but i didn't measure any problems with this. and i view allocatoins as much worse. most code is ascii, so we're going to fastpath all these ToLowers all the time.

ToddGrun · 2024-05-16T01:37:38Z

src/Workspaces/Core/Portable/Shared/Extensions/SourceTextExtensions.cs

@@ -70,35 +70,64 @@ public static TextChangeRange GetEncompassingTextChangeRange(this SourceText new
        return TextChangeRange.Collapse(ranges);
    }

-    public static int IndexOf(this SourceText text, string value, int startIndex, bool caseSensitive)
+    public static int IndexOf(this SourceText text, string searchString, int startIndex, bool caseSensitive)


IndexOf

BTW, I wonder if something similar to what was done in https://devdiv.visualstudio.com/DevDiv/_git/VSUnitTesting/pullrequest/550572 might be useful in source text searching.

i leave to you to implement :)

slacker! :) Did this method show up at all in the CPU side of your profile?

I'll check again. I think it's dominated by compiler time. Will do tomorrow!

ToddGrun

ToddGrun · 2024-05-16T02:45:03Z

src/Workspaces/Core/Portable/Shared/Extensions/SourceTextExtensions.cs

-                //
-                // only one implementation we have that could have bad indexer perf is CompositeText with heavily modified text
-                // at compiler layer but I believe that being used in find all reference will be very rare if not none.
-                if (!Match(normalized[j], text[i + j], caseSensitive))


if (!Match(normalized[j], text[i + j], caseSensitive))

Already approved, but dumb question I just thought of:

Why not just use CaseInsensitiveComoparison.Equals passing in both as ReadOnlySpan (nothing normalized)

So once it's not spanable. It's a source text. :-(

makes sense, thanks!

CyrusNajmabadi · 2024-05-22T00:07:20Z

closing out. i haven't been able to see this again.

CyrusNajmabadi added 2 commits May 15, 2024 17:08

Avoid unnecessary allocations while finding token matches in a file

7a1b12c

Avoid unnecessary allocations while finding token matches in a file

5a05362

CyrusNajmabadi requested a review from a team as a code owner May 16, 2024 00:14

dotnet-issue-labeler bot added Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead labels May 16, 2024

CyrusNajmabadi commented May 16, 2024

View reviewed changes

CyrusNajmabadi requested a review from ToddGrun May 16, 2024 00:15

ToddGrun reviewed May 16, 2024

View reviewed changes

ToddGrun approved these changes May 16, 2024

View reviewed changes

ToddGrun reviewed May 16, 2024

View reviewed changes

Merge branch 'main' into farAllocs

eb8afa6

CyrusNajmabadi closed this May 22, 2024

CyrusNajmabadi deleted the farAllocs branch May 22, 2024 00:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid unnecessary allocations while finding token matches in a file #73500

Avoid unnecessary allocations while finding token matches in a file #73500

CyrusNajmabadi commented May 16, 2024

CyrusNajmabadi May 16, 2024

ToddGrun May 16, 2024

CyrusNajmabadi May 16, 2024

ToddGrun May 16, 2024

CyrusNajmabadi May 16, 2024

ToddGrun May 16, 2024

CyrusNajmabadi May 16, 2024

ToddGrun left a comment

ToddGrun May 16, 2024

CyrusNajmabadi May 16, 2024

ToddGrun May 16, 2024

CyrusNajmabadi commented May 22, 2024

Avoid unnecessary allocations while finding token matches in a file #73500

Avoid unnecessary allocations while finding token matches in a file #73500

Conversation

CyrusNajmabadi commented May 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ToddGrun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CyrusNajmabadi commented May 22, 2024