Graceful shutdown #528

ivan770 · 2020-05-22T14:38:39Z

Tide could store JoinHandle of every request it handles, and await them when server is shutting down.

Lines 300 to 311 in 7d0b848

    
           task::spawn(async move { 
        
               let res = async_h1::accept(&addr, stream, |req| async { 
        
                   let res = this.respond(req).await; 
        
                   let res = res.map_err(|_| io::Error::from(io::ErrorKind::Other))?; 
        
                   Ok(res) 
        
               }) 
        
               .await; 
        
               if let Err(err) = res { 
        
                   log::error!("async-h1 error", { error: err.to_string() }); 
        
               } 
        
           });

async-std book

The text was updated successfully, but these errors were encountered:

Fishrock123 · 2020-05-22T17:11:21Z

I think the server only shuts down in a panic! or due to an external signal, such SIGINT, or whatever the Windows equivalents are.

I suppose this could possibly be desirable for SIGTERM which allows for such a thing. If you panic I'm not sure it is ever really safe to continue.

Fishrock123 · 2020-05-22T18:51:48Z

Oh I think I see what you are saying - essentially we should have a way to await all tasks which are still alive, I think?

ivan770 · 2020-05-22T18:54:10Z

Oh I think I see what you are saying - essentially we should have a way to await all tasks which are still alive, I think?

Yes. I mean, new requests are not handled, but at least those that are still awaiting response should be finished

mehcode · 2020-05-27T07:22:44Z

Was just coming to make an issue for this. It's critical from a production standpoint that a load balancer or swarm manager can initiate a controlled or graceful shutdown.

Any requests that came in before the "shut down please" request should still finish. Icing would be a way to shutdown with a timeout but I think we can compose that if there is an API for shutdown.

The way I envision usage would be a shutdown method on Server. As Server is clone-able we could clone, wait for SIGTERM, and call shutdown on it.

repnop · 2020-06-11T14:50:50Z

Definitely agree with @mehcode. Having the ability to gracefully shutdown the server is pretty important for me as well. 👍

ririsoft · 2020-06-11T15:48:59Z

FWIW here is how I do it using the async-ctrlc crate:

use async_std::prelude::FutureExt;

async fn run() -> Result<()> {
    let ctrlc = async {
        CtrlC::new().expect("Cannot use CTRL-C handler").await;
        println!("termination signal received, stopping server...");
        Ok(())
    };

    let app = async {
        let app = tide::new();
        app.listen("localhost:8080").await
    };

    app.race(ctrlc).await?;
    println!("server stopped.");
    Ok(())
}

Please note that this will not wait for all running requests to complete, this is a "brutal graceful shutdown", if I dare. But at least it handles SIGTERM and CTRL-C shutdown (cross platform) which is better than nothing.

repnop · 2020-06-12T13:15:46Z

I did try messing around with adding graceful shutdown to tide yesterday evening and even got tests working, but ran into a bit of a problem that I'll outline in hopes someone better at async code can figure out a better solution.

The actual waiting for requests to finish is pretty simple with the design I used: spawn a task that uses a Receiver (I used the futures-channel channel) and use that as a stream to await all job handles and when the channel is closed from the sender side, it'll finish all of the jobs and break out of looping, which can be used to signal all finished requests have been finished.

The other part seems just as simple as well: if someone has initiated a shutdown (I used an Arc<AtomicBool> here and set it to true in a shutdown method), stop listening for new connections, close the sender portion of the channel, and wait for the existing ones to finish (I also had it serve up 503s during this time), then return. This worked, except that using stream combinators to figure out when to stop polling would result in it waiting for a request to be served after shutdown was initiated because, I think, the TcpListener was only notifying the waker it was ready when there was new data on the socket, at which point it would then see shutdown is true on the next poll cycle, and go as expected.

To get around this I was able to use a select! and check the value of shutdown if the incoming.next() didn't resolve, but to make it stop infinitely looping there needed to be a task::yield_now().await on the false case, which seemed less than ideal, and where I ended my attempt.

If anyone has thoughts on how to avoid the issue, do let me know and I can give them a try and report back. Or if anyone else wants to take a stab at implementing it themselves, by all means 👍

yoshuawuyts · 2020-06-12T16:30:25Z

I've thought about this a little in the past, and gathered some notes on this. What I think we need to do is use stop-token here:

tide/src/server.rs

Lines 334 to 335 in 293610b

    
           let mut incoming = listener.incoming(); 
        
           while let Some(stream) = incoming.next().await {

and here

tide/src/server.rs

Lines 292 to 293 in 293610b

    
           let mut incoming = listener.incoming(); 
        
           while let Some(stream) = incoming.next().await {

This allows us to stop receiving incoming requests and allow the server to gracefully exit once a token has been received.

However stop-token is not quite the end of the road for us. What I think we need is a tighter integration with Stream and Future. I've also gathered some notes on a design for async-std we could use to make cancellation easier. If we had that we could probably use it to enable cancellation of Tide as well:

let src = StopSource::new();
let token = src.token();

// Send a token to each app we initialize
task::spawn(async move {
    let mut app = tide::new();
    app.stop_on(token.clone());
    app.listen("localhost:8080").await?;
});

// app will stop receiving requests once we drop the `StopSource`.
// this can be be done in response to e.g. a signal handler.

It's a bit of a question how we'd combine this today. We could probably define our own StopToken for the time being, and implement the API outlined above. But once we have this in async-std we could probably rely on that. Hope this is helpful!

GWBasic · 2020-11-08T16:00:06Z

I'm learning Rust, so I tried to see if I could fix this: https://github.com/http-rs/tide/compare/main...GWBasic:GracefulShutdown2?w=1

Specifically, I solved @repnop 's problem by using future::select. It allows awaiting on two futures and returns when the first completes: https://github.com/http-rs/tide/compare/main...GWBasic:GracefulShutdown2?w=1#diff-1f14b8b6953e2388b19efaf814862a89c0b57c45a6814f79ed373fde05d864d0R82

I then created a "CancelationToken" future that returns once it is canceled.

So far, I only know how to test tcp_listener, and I adjusted its tests to work by canceling gracefully: https://github.com/http-rs/tide/compare/main...GWBasic:GracefulShutdown2?w=1#diff-0f530b03216a301ca1004179731df8510170be421ff7abf545db673eca3bc4beR12

I haven't done anything with waiting for ongoing requests to complete, or testing other listeners. I must admit that I've tried to return futures from the listen function; but that leads to lots of fighting with the borrow checker. That's a bit more than I can handle as a Rust novice. ;)

Anyway, if you believe my branch is in line with tide's overall design philosophy, I'm happy to make a pull request and change whatever you want. (Especially for testing the rest of the listeners.) Otherwise, this bug was a great learning exercise for me.

Fishrock123 · 2020-11-18T01:49:37Z

@GWBasic Can you make at least a draft PR?

GWBasic · 2020-11-18T14:23:59Z

@Fishrock123 : See #746

(Note that some merge conflicts creeped in.)

Looking forward to your feedback!

GWBasic · 2020-12-16T18:50:43Z

@Fishrock123 : I had too many merge conflicts in #746, so I created #766

It uses futures-lite.

Thoughts? Again, I'm mostly interested in learning Rust, so if you'd rather do this in a different way, that's fine with me.

pbzweihander · 2021-06-25T04:02:13Z

I've also made #836. Please give any thoughts or suggestions!

KoltesDigital · 2022-01-21T13:40:29Z

Any news?

yoshuawuyts · 2022-01-21T14:13:26Z

I think it's important to break this down into two parts:

What is the design as it would be if async Rust had all primitives implemented.
What design can we implement now?

A design for Future Rust.

I've recently written about async cancellation. For Tide, the change we need to make is that our calls to task::spawn (e.g. tcp_listener, unix_listener) implement "cancel on drop" semantics. So that when the Server instance is dropped, all tasks are cancelled.

We could do this using e.g. the async-task-group crate instead of calling async_std::task::spawn directly. Within the spawned tasks we could make it so each existing request finishes executing in an AsyncDrop destructor, so by the time the server finishes dropping all requests have finished executing.

Using (for example) the stop-token crate, we could then shutdown a server like:

use stop_token::prelude::*;
use std::time::{Duration, Instant};
use async_std::task;

// Create the server and register a route which takes some time to complete.
let mut server = tide::new();
server.at("/").get(|_| async move {
    task::sleep(Duration::from_secs(10)).await;
    Ok("hello")
});

// Start listening for incoming requests, but close the server after 3 secs.
// Instead of waiting for a duration we could instead also wait for an external signal.
// Either way, this should graciously wait for all requests to terminate.
server
    .listen("localhost:8080")
    .timeout_at(Instant::now() + Duration::from_secs(3));
    .await?;

println!("finished cleaning up all requests!");

What can we do now?

Unfortunately we currently don't have access to async Drop so we're limited by how we can implement this. The biggest downside of this is that instead of being able to use composable cancellation (e.g. when Server is dropped, we gracefully shut down), we need to hard-code a shutdown mechanism into Tide.

Likely the easiest route to go would be to hard-code a Server::timeout_at method which takes a Deadline. The question is whether we would want to take a dependency on an external crate for this, or provide this mechanism ourselves. I feel that if we were to provide it ourselves, it might be the most resilient. But keen to hear thoughts on this.

I hope this helps clarify what the current thinking is!

KoltesDigital · 2022-01-21T15:22:04Z

Thanks!

I'm sure I read on a page linked from your blog that drop is not guaranteed to be called. What is your take on relying on it to shutdown the server?

Maybe app.listen could yield a kind of RunningServerHandle from which .shutdown() could be called. I've done a lot of NodeJS servers with express and those app object have listen and close methods, I think it is more idiomatic to give shutdown once listen has been called. Maybe this handle could also have a block method, in order to get back the current behavior, i.e. most users would simply change from app.listen(...).await?; to app.listen(...).block().await?;. Sadly it's a breaking change.

GWBasic · 2022-01-26T02:59:46Z

Hey, thanks for the mention. I'm leaving this thread: I only attempted (#766) as a learning exercise.

Best of luck!

yoshuawuyts · 2022-01-28T11:59:30Z

I'm sure I read on a page linked from your blog that drop is not guaranteed to be called. What is your take on relying on it to shutdown the server?

Drop is not guaranteed to be called because mem::forget is not unsafe, which means for soundness purposes we cannot rely on it. This is far less about "what could we theoretically do" and more: "what is the default behavior". In cases like these we can absolutely expect Drop to just run, and we don't need to worry about it not working ¹.

If Drop wasn't something we could rely on, abstractions such as Arc, Mutex, etc. would be so unreliable that they'd be pointless. But luckily we can just rely on Drop for almost everything except when we're talking about soundness. ↩

Fishrock123 mentioned this issue Jul 31, 2020

Remove Unix Socket on close #669

Open

jbr mentioned this issue Aug 12, 2020

How to handle request cancelation? http-rs/async-h1#123

Open

GWBasic mentioned this issue Nov 18, 2020

Graceful shutdown2 #746

Closed

GWBasic linked a pull request Dec 16, 2020 that will close this issue

Graceful shutdown futures lite #766

Draft

pbzweihander mentioned this issue Jun 25, 2021

Proposal: Cancellation #836

Draft

cailloumajor mentioned this issue Dec 17, 2021

Implement graceful shutdown cailloumajor/line-supervision-api-service#30

Open

2 tasks

mleonhard mentioned this issue Mar 1, 2022

Stop a Tide server (not graceful shutdown) #876

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graceful shutdown #528

Graceful shutdown #528

ivan770 commented May 22, 2020

Fishrock123 commented May 22, 2020

Fishrock123 commented May 22, 2020

ivan770 commented May 22, 2020

mehcode commented May 27, 2020 •

edited

repnop commented Jun 11, 2020

ririsoft commented Jun 11, 2020 •

edited

repnop commented Jun 12, 2020

yoshuawuyts commented Jun 12, 2020

GWBasic commented Nov 8, 2020

Fishrock123 commented Nov 18, 2020

GWBasic commented Nov 18, 2020

GWBasic commented Dec 16, 2020

pbzweihander commented Jun 25, 2021

KoltesDigital commented Jan 21, 2022

yoshuawuyts commented Jan 21, 2022

KoltesDigital commented Jan 21, 2022

GWBasic commented Jan 26, 2022

yoshuawuyts commented Jan 28, 2022 •

edited

Graceful shutdown #528

Graceful shutdown #528

Comments

ivan770 commented May 22, 2020

Fishrock123 commented May 22, 2020

Fishrock123 commented May 22, 2020

ivan770 commented May 22, 2020

mehcode commented May 27, 2020 • edited

repnop commented Jun 11, 2020

ririsoft commented Jun 11, 2020 • edited

repnop commented Jun 12, 2020

yoshuawuyts commented Jun 12, 2020

GWBasic commented Nov 8, 2020

Fishrock123 commented Nov 18, 2020

GWBasic commented Nov 18, 2020

GWBasic commented Dec 16, 2020

pbzweihander commented Jun 25, 2021

KoltesDigital commented Jan 21, 2022

yoshuawuyts commented Jan 21, 2022

A design for Future Rust.

What can we do now?

KoltesDigital commented Jan 21, 2022

GWBasic commented Jan 26, 2022

yoshuawuyts commented Jan 28, 2022 • edited

Footnotes

mehcode commented May 27, 2020 •

edited

ririsoft commented Jun 11, 2020 •

edited

yoshuawuyts commented Jan 28, 2022 •

edited