Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AccessViolationException in HttpClient.SendAsync() on macOS arm64 in .NET 9 preview.4 #102313

Open
martincostello opened this issue May 16, 2024 · 23 comments

Comments

@martincostello
Copy link
Member

Description

I have an application I'm testing against .NET 9 daily builds. With the latest SDK and package versions I'm seeing System.AccessViolationException : Attempted to read or write protected memory. This is often an indication that other memory is corrupt. errors in my tests when running on GitHub Actions' macOS arm64 runners.

I have 9 applications I'm doing this with, but I'm only observing this issue with one of them.

Reproduction Steps

  1. Clone martincostello/costellobot@18966b8
  2. Run build.ps1 on a macOS arm64 machine.

Expected behavior

The tests all pass.

Actual behavior

Lots of test failures with a stack trace similar to the below:

System.AccessViolationException : Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
   at MartinCostello.Costellobot.Infrastructure.IntegrationTests`1.PostWebhookAsync(String event, Object value, String webhookSecret, String delivery) in /_/tests/Costellobot.Tests/Infrastructure/IntegrationTests`1.cs:line 176
   at MartinCostello.Costellobot.Handlers.DeploymentStatusHandlerTests.PostWebhookAsync(DeploymentStatusDriver driver, String action) in /_/tests/Costellobot.Tests/Handlers/DeploymentStatusHandlerTests.cs:line 667
   at MartinCostello.Costellobot.Handlers.DeploymentStatusHandlerTests.Deployment_Is_Approved_For_Trusted_User_And_Dependency_When_Penultimate_Build_Skipped() in /_/tests/Costellobot.Tests/Handlers/DeploymentStatusHandlerTests.cs:line 128
--- End of stack trace from previous location ---

That corresponds to this code, but it doesn't appear to have been recently changed:

async Task<HttpResponseMessage> Core(
HttpRequestMessage request, HttpCompletionOption completionOption,
CancellationTokenSource cts, bool disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
{
bool telemetryStarted = StartSend(request);
bool responseContentTelemetryStarted = false;
HttpResponseMessage? response = null;
try
{
// Wait for the send request to complete, getting back the response.
response = await base.SendAsync(request, cts.Token).ConfigureAwait(false);
ThrowForNullResponse(response);
// Buffer the response content if we've been asked to.
if (ShouldBufferResponse(completionOption, request))
{
if (HttpTelemetry.Log.IsEnabled() && telemetryStarted)
{
HttpTelemetry.Log.ResponseContentStart();
responseContentTelemetryStarted = true;
}
await response.Content.LoadIntoBufferAsync(_maxResponseContentBufferSize, cts.Token).ConfigureAwait(false);
}
return response;
}
catch (Exception e)
{
HandleFailure(e, telemetryStarted, response, cts, originalCancellationToken, pendingRequestsCts);
throw;
}
finally
{
FinishSend(response, cts, disposeCts, telemetryStarted, responseContentTelemetryStarted);
}
}

Regression?

Yes.

Known Workarounds

None.

Configuration

.NET SDK 9.0.100-preview.4.24265.4

Other information

No response

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label May 16, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

@antonfirsov antonfirsov removed the untriaged New issue has not been triaged by the area owner label May 16, 2024
@antonfirsov antonfirsov added this to the 9.0.0 milestone May 16, 2024
@antonfirsov
Copy link
Member

@martincostello any chance you can help us by collecting traces? This could give us some preliminary ideas before someone manages to jump on the issue.

dotnet-trace collect --providers System.Net.Http,Private.InternalDiagnostics.System.Net.Http,System.Net.Security,System.Net.Sockets,System.Net.NameResolution,System.Threading.Tasks.TplEventSource:0x80:4 --process-id <PID>

@antonfirsov
Copy link
Member

Could this be the root cause?

@martincostello
Copy link
Member Author

any chance you can help us by collecting traces?

Sure, I'll try and do this later today.

Could #101479 (comment) be the root cause?

I did spot that issue, but it seemed a bit too old to still be an issue in the nightlies?

@martincostello
Copy link
Member Author

dotnet-trace collect --providers System.Net.Http,Private.InternalDiagnostics.System.Net.Http,System.Net.Security,System.Net.Sockets,System.Net.NameResolution,System.Threading.Tasks.TplEventSource:0x80:4 --process-id <PID>

Just looked at this - how do I do this exactly when I don't know what the test process' ID will be until after I start the tests?

@wfurt
Copy link
Member

wfurt commented May 16, 2024

I know this annoying. I often add Console.Readline() to the repro and hit enter after tracing started. Other alternative is to add EventListener the the repro itself.

@wfurt
Copy link
Member

wfurt commented May 16, 2024

BTW the linked issue blames libgit2sharp. Do you see same issue on any other platform @martincostello ?

@martincostello
Copy link
Member Author

At the moment my repro is just "run dotnet test" - I don't have anything more specific at the moment.

I could maybe add something to the tests to have the tests start monitor itself (if that works?)

I'm only seeing this on macOS, not Windows or Linux, but only in this one project. I'm not specifically using lib2gitsharp anywhere, and if it is being used, I would assume it's from an SDK or something and would/should be present in my other repos I'm testing nightlies with.

@martincostello
Copy link
Member Author

If it helps narrow the search, this has started happening at some point since the 24th April, but all I had to go on was "dotnet test crashes" at that point and I figured it would stabilise at some point. It's only in the last day or so I've noticed it's progressed to not killing the whole process and giving me an exception message and stack trace.

@martincostello
Copy link
Member Author

But that probably correlates more with GitHub Actions runners moving from macOS 12 x64 to macOS 14 arm64, so it was probably broken before then and my build was just on a different arch so I didnt see it before...

@martincostello
Copy link
Member Author

martincostello commented May 16, 2024

I don't seem to be able to get hold of a .netrace file as now dotnet test is just crashing again. I do have a dump file I collected a few days ago, but it's ~1.3GB zipped and will contain secrets from my repository that are configured in environment variables.

If you let me know what the right secure process to provide the dump file is, I can provide one tomorrow.

@martincostello
Copy link
Member Author

I'm possibly seeing this issue in a different repository using the official preview.4 build as well: logs.

In that case, dotnet test is just outright crashing, rather than throwing an AccessViolationException.

@martincostello martincostello changed the title AccessViolationException in HttpClient.SendAsync() on macOS arm64 in .NET 9 preview.4 nightly builds AccessViolationException in HttpClient.SendAsync() on macOS arm64 in .NET 9 preview.4 May 22, 2024
@martincostello
Copy link
Member Author

Yep, definitely an issue with the official .NET 9 preview.4 build: crash

@martincostello
Copy link
Member Author

I've also spotted that sometimes instead of crashing, I get this instead:

System.NullReferenceException : Object reference not set to an instance of an object.
   at System.Reflection.MethodBaseInvoker.InvokeWithFewArgs(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at McMaster.Extensions.CommandLineUtils.Conventions.ExecuteMethodConvention.InvokeAsync(MethodInfo method, Object instance, Object[] arguments)
   at McMaster.Extensions.CommandLineUtils.Conventions.ExecuteMethodConvention.OnExecute(ConventionContext context, CancellationToken cancellationToken)
   at McMaster.Extensions.CommandLineUtils.Conventions.ExecuteMethodConvention.<>c__DisplayClass0_0.<<Apply>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at McMaster.Extensions.CommandLineUtils.CommandLineApplication.ExecuteAsync(String[] args, CancellationToken cancellationToken)
   at MartinCostello.DotNetBumper.Bumper.RunAsync(IAnsiConsole console, String[] args, Func`2 configureLogging, CancellationToken cancellationToken) in /_/src/DotNetBumper/Bumper.cs:line 98
   at MartinCostello.DotNetBumper.EndToEndTests.Application_Returns_Two_If_Cancelled_By_User() in /_/tests/DotNetBumper.Tests/EndToEndTests.cs:line 425
--- End of stack trace from previous location ---

That appears to come from here: ExecuteMethodConvention.InvokeAsync()

@antonfirsov
Copy link
Member

antonfirsov commented May 22, 2024

I've also spotted that sometimes instead of crashing, I get this instead

We don't own McMaster CommandLine utilities. We should focus on the HttpClient AV/crash in this issue.

It would be very helpful for us if there were crash dumps and if you could do some experiments that can help deciding whether we can exclude libgit2sharp or other 3rd party components as sources of memory corruption. Ideally there should be a repro that only uses HttpClient.

@martincostello
Copy link
Member Author

@antonfirsov I have a crash dump - I just need to know how to securely give it to you: #102313 (comment)

@antonfirsov
Copy link
Member

Sorry, I missed that. Can you send it over email? (anfirszo at companydomain).

@martincostello
Copy link
Member Author

@antonfirsov The dump is 1.4GB compressed. Is there a secure file share to upload it to instead?

@danmoseley Can you point me to the appropriate feedback thing that's used for this? I forget where it is.

@martincostello
Copy link
Member Author

This seems to have stopped happening as of the 9.0.100-preview.5.24279.9 SDK. I still have the crash dump from the earlier build.

@martincostello
Copy link
Member Author

Scratch that - it's just gotten less frequent.

@antonfirsov
Copy link
Member

antonfirsov commented May 31, 2024

Can you point me to the appropriate feedback thing that's used for this?

There should be a way to privately attach files to tickets opened on https://developercommunity.visualstudio.com.

Alternatively, if there is a way for you to safely share it via a private link to OneDrive or similar, you can send me a link over email, I will let you know after downloading it so you can delete the share.

@martincostello
Copy link
Member Author

I tried to use the feedback site, but the webpage OOMs in Chrome when I try and attach the ZIP file 😅

I'm just setting up a throw-away OneDrive share now, and I'll share it with the email address you provided earlier.

@martincostello
Copy link
Member Author

@antonfirsov Just emailed you the link.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants