Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConnectionImpl task.Result blocking on heavy load #2679

Open
ryno1234 opened this issue Mar 19, 2024 · 12 comments
Open

ConnectionImpl task.Result blocking on heavy load #2679

ryno1234 opened this issue Mar 19, 2024 · 12 comments

Comments

@ryno1234
Copy link

ryno1234 commented Mar 19, 2024

We're having issues, particularly when our server (IIS / ASP.Net 8.0) is under heavy load, where the creation of the ConnectionMultiplexer becomes a blocking thread for many other threads. Below is a screenshot from a memory dump of our production server showing that this line of code is blocking 1,700+ other threads.

image

We can seemingly get around this issue by preventing traffic from going to our server, starting our IIS site, requesting a page that interacts with Redis to almost "warm up" the Redis connection and then reenabling traffic. After that, everything seems to work fine, but if we have traffic going to the server out of the gate during the site's startup (such as after a new deployment), the creation of the multiplexer blocks about 50% of the time (sometimes it blocks, sometimes it doesn't... it all has to do with the traffic pattern).

I'm not sure if this is thread starvation or what. We're not using ConfigureAwait(false) since we're not on .Net Framework.

Here's our connection string configuration = {redactedhost:6379,syncTimeout=5000,allowAdmin=False,connectTimeout=5000,ssl=False,abortConnect=False,connectRetry=5}

We create a Singleton Multiplexer through this startup code

    public static IServiceCollection AddRedisCacheService(this IServiceCollection services)
    {
        services.TryAddSingleton<ICache, RedisCache>();

        // From AddStackExchangeRedisExtensions
        services.TryAddSingleton<IRedisClientFactory, RedisClientFactory>();
        services.TryAddSingleton<ISerializer>(new NewtonsoftSerializer(new JsonSerializerSettings { TypeNameHandling = TypeNameHandling.Auto }));
       
        services.TryAddSingleton<IRedisConnectionPoolManager, RedisConnectionPoolManager>();

        services.TryAddSingleton((provider) => provider
            .GetRequiredService<IRedisClientFactory>()
            .GetDefaultRedisClient());

        services.TryAddSingleton((provider) => provider
             // Block path is through these calls
            .GetRequiredService<IRedisClientFactory>()  
            .GetDefaultRedisClient()
            .GetDefaultDatabase());

        // Register a 'Configuration Factory' for extensions
        services.AddSingleton<IEnumerable<RedisConfiguration>>(sp => [sp.GetRequiredService<RedisConfiguration>()]);

        return services;
    }
@mgravell
Copy link
Collaborator

mgravell commented Mar 19, 2024

first thought: if this is becoming a blockage: how many multiplexers are you creating? Usually, they're shared and aggressively reused (concurrently), with a long lifetime - often process-long. You shouldn't usually need or want multiple.

However, indeed; looking closely, it looks to me like we're missing an else here; if we don't connect within the timeout, we shouldn't be touching task.Result - I wonder if this should simply be:

- if (!task.Result) throw ...
+ else if (!task.Result) throw ...

That make sense folks? /cc @NickCraver @philon-msft

@ryno1234
Copy link
Author

ryno1234 commented Mar 19, 2024

@mgravell, I updated my question to include context as to how we're creating our Multiplexer. We should only be creating 1.

@mgravell
Copy link
Collaborator

hard to say, really, since none of those types are "this library", but the mention of RedisConnectionPoolManager fills me with dread; in most scenarios, you don't need a pool - but: that at least explains multiple blocks here - that strongly suggests a bunch of multiplexers (again, not my type so I can't comment), but: working on a PR - you've called my attention to something that smells wrong

@ryno1234
Copy link
Author

@mgravell, you're right! Looks like we're creating 5 multiplexers... This is part of an extension library https://github.com/imperugo/StackExchange.Redis.Extensions ... I will look further into this part.

image

@mgravell
Copy link
Collaborator

OK; I will withhold comment on the external library as it is nothing to do with me ;)

PR in progress - I agree there's a problem there

@mgravell
Copy link
Collaborator

what I will say; the architecture of the current IO loop means that there is an unnecessary bottleneck in some of the processing; that might today be helped by using a pool, but: I'm actively working on a significant overhaul to the IO loop which completely removes this blockage - so it could be that there is a scenario today that does benefit from pooling, which will not in v3 (ETA: 1 month)

@ryno1234 ryno1234 changed the title ConnectionImp task.Result blocking on heavy load ConnectionImpl task.Result blocking on heavy load Mar 19, 2024
@ryno1234
Copy link
Author

First off, I really appreciate your quick response earlier. That really helped us refocus. Thank you.

For what it's worth, we updated the connection pool to just 1 multiplexer (again, I realize this isn't your type) and still experienced the blocking on that task.Result.

If we assumed that the other library that is facilitating this connection isn't the problem, what would cause our code to deadlock like this?

Put another way, I assume that although you did in fact find an issue in the code, it obviously isn't prevalent otherwise you would have heard a LOT of people talking about it, so, what could we be doing that is so different than anyone else that causes the deadlock?

@mgravell
Copy link
Collaborator

hard to say without being able to repro it on demand; however, this scenario applies specifically to the situation where we're trying to connect to a server, it hasn't replied promptly - so it could be something related to server unavailability or network instability. There is an accidental sync-over-async there which we are working on resolving (but in reviewing it yesterday, we identified a few knock-on things that we want to understand fully before changing it). The fact that the other library is using Connect rather than ConnectAsync is a contributory factory here.

IMO the "real" solution: use ConnectAsync instead of Connect.

@mmriis
Copy link

mmriis commented May 8, 2024

We encountered the same issue today... also using the mentions extension nuget packages.

@ryno1234 What solution did you end up with ?

@ryno1234
Copy link
Author

ryno1234 commented May 8, 2024

@mmriis, we removed the usage of https://github.com/imperugo/StackExchange.Redis.Extensions and went directly to using this library instead. We utilized aspects of the extensions library in our own code base (copy / paste / modify), namely the serialization aspects.

Our main issue was the race conditions with the connection, so we aggressively put a lock(){} statement around the connection knowing darn well we were blocking. We also do this within the factory that resolves the Multiplexer because we use the Multiplexer in our dependency injection.

@mmriis
Copy link

mmriis commented May 13, 2024

@ryno1234 Thank you. I was thinking we would do the same. Serialization/deserialization is pretty much the only thing we use from that library, so easy to recreate.

I don't understand why you would need the lock around the factory. Wouldn't ConnectionMultiplexer be a singleton with connections established on app start up ?

@ryno1234
Copy link
Author

@mmriis, whoops, you're right. I think the lock concept was something I was playing around with initially. Forgot that we went away from that for the reason you mentioned.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants