Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to filter by "user-perceived" unhandled exceptions/fatal errors generated by the Kotlin SDK #3425

Open
arifken opened this issue May 10, 2024 · 4 comments

Comments

@arifken
Copy link

arifken commented May 10, 2024

Problem Statement

Due to the behavior of coroutines exception handling, a failure to catch the exception can sometimes result in a crash, if the request was made from certain contexts, and can be swallowed when made from others.

In both of these cases, Sentry labels the error as level:fatal with exception.handled:false. However, this creates the possibility for a spike in "crashes" that do not actually crash the app for the user.

We recently encountered this issue with a android_getaddrinfo exception, where a user was trying to make a network connection while in airplane mode. What we saw is a huge spike in errors in Sentry, but when we went to Google Play, we were able to filter by user-perceived errors and see that the actual count of crashes for this issue was very very low.

It appears that Sentry's "Crash free user rate" takes into consideration fatal errors that are "user perceived" vs. not (the spike in errors did not impact our crash free user rate).

Solution Brainstorm

We'd love to be able to query by user_perceived:true in Discover, Dashboards, and Alerts (for error data sources) so that we can differentiate between user-perceived vs non-main thread crashes ourselves.

when specifying user_perceived:true, I would expect to only see crashes that resulted in the Android app closing, and contribute to the crash-free user rate shown by Sentry.

when specifying user_perceived:false level:fatal, I would expect to only see errors that are swallowed up by the parent coroutine and do not actually crash the app for the user.

Product Area

Issues

@getsantry
Copy link

getsantry bot commented May 10, 2024

Assigning to @getsentry/support for routing ⏲️

@getsantry
Copy link

getsantry bot commented May 13, 2024

Routing to @getsentry/product-owners-issues for triage ⏲️

@anthonycr
Copy link

I want to provide more context on this, as I was wrong when I initially suggested that it was the coroutines exception handler was the primary cause of this observability hole. I had more time to do debugging this week and fully identify the cause of the crash. Here's the scenario:

Let's say we have Activity A and Activity B. Activity A is the app's main activity, and Activity B is a secondary activity. A user action triggers a network request inside a coroutine in Activity B. That network request fails, and the coroutine crashes. That crash DOES get routed to the global coroutines exception handler. However, where I was wrong, is that the coroutine exception wasn't crashing the app but was getting routed to Sentry.

What is actually happening, is that because no local coroutine exception handler exists, the global coroutines exception handler re-throws the exception, and crashes Activity B. This is how Sentry is receiving the crash. However, the app itself does not terminate, so the Sentry session does not close, but remains open. The Android OS sees that this crash happened on Activity B, and was user triggered, and tries to recover by restarting Activity A. Since Activity A does not have a bug, the app is able to recover. In our case, since the error was sporadic and triggered by flaky network, the user usually did not re-trigger the crash, so the session never got ended, and the session crash free rate never went down.

This leads me to the question: Would it be possible to have two crash free rates? One which calculates the number of sessions that experienced a crash and a second one (the same as the current crash free rate) that calculates the number of sessions that ended with a crash. This would allow Sentry to provide the more nuanced context that the Play Store console shows.

@vartec
Copy link
Member

vartec commented May 16, 2024

I believe that to get this working in Issues we must first get it as a tag from the SDK, so I'm transferring this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Waiting for: Product Owner
Status: Needs Discussion
Development

No branches or pull requests

3 participants