Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http3 quiche: QUIC connection is draining #12037

Closed
mb opened this issue Oct 5, 2023 · 11 comments
Closed

http3 quiche: QUIC connection is draining #12037

mb opened this issue Oct 5, 2023 · 11 comments

Comments

@mb
Copy link

mb commented Oct 5, 2023

I did this

Running curl3 (curl with quiche) from AUR: https://aur.archlinux.org/packages/curl-http3
Connecting to the mozilla-central http3 test server using the first port outputted by the http3server.

$ SSLKEYLOGFILE=~/tmp/test.keys curl3 -i --http3-only https://127.0.0.1:41996/hello --insecure -vv
* processing: https://127.0.0.1:41996/hello
*   Trying 127.0.0.1:41996...
* Connected to 127.0.0.1 (127.0.0.1) port 41996
* using HTTP/3
* Using HTTP/3 Stream ID: 0
> GET /hello HTTP/3
> Host: 127.0.0.1:41996
> User-Agent: curl/8.2.1
> Accept: */*
> 
* QUIC connection is draining
* Connection #0 to host 127.0.0.1 left intact
curl: (95) QUIC connection is draining

The error is coming from:

  • failf(data, "QUIC connection is draining");
  • https://docs.rs/quiche/0.18.0/quiche/struct.Connection.html#method.is_draining

    Returns true if the connection is draining.

    If this returns true, the connection object cannot yet be dropped, but no new application data can be sent or received. An application should continue calling the recv(), timeout(), and on_timeout() methods as normal, until the is_closed() method returns true.

    In contrast, once is_draining() returns true, calling send() is not required because no new outgoing packets will be generated.

draining is not an error to drop the connection. As far as I can tell the connection is terminated on that error.

Test server binaries and wireshark capture including the SSLKEYLOGFILE: test-setup.zip

I expected the following

I would expect the connection not to be terminated, but to be continued. I hope I understand it correctly and this is really a bug here, sorry if not.

curl/libcurl version

$ curl3 --version
curl 8.2.1 (x86_64-pc-linux-gnu) libcurl/8.2.1 BoringSSL zlib/1.3 brotli/1.1.0 zstd/1.5.5 libidn2/2.3.4 libpsl/0.21.2 (+libidn2/2.3.4) libssh2/1.11.0 nghttp2/1.56.0 quiche/0.17.2 librtmp/2.3
Release-Date: 2023-07-26
Protocols: dict file ftp ftps gopher gophers http https imap imaps mqtt pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS brotli GSS-API HSTS HTTP2 HTTP3 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL threadsafe UnixSockets zstd

operating system

$ uname -a
Linux archlinux 6.5.5-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 23 Sep 2023 22:55:13 +0000 x86_64 GNU/Linux
@bagder bagder added the HTTP/3 h3 or quic related label Oct 5, 2023
@bagder
Copy link
Member

bagder commented Oct 16, 2023

Can you reproduce this against a publicly available URL?

@bagder
Copy link
Member

bagder commented Oct 19, 2023

I have not been able to reproduce, because I get reproducible crashes in quiche all the time: cloudflare/quiche#1633

@mb
Copy link
Author

mb commented Oct 19, 2023

I don't know any publicly available server with that error. I tried to setup the http3server on my test server. Unfortunately, I can't reproduce there https://curl-neqo.neon.rocks:4433. I don't get to the stage where I connect and haven't been able to send out the GET / request, because the connection gets reset for some reason. curl3 -vv --http3-only https://curl-neqo.neon.rocks:4433

 curl3 -vv --http3-only https://curl-neqo.neon.rocks:4433
* processing: https://curl-neqo.neon.rocks:4433
*   Trying 89.58.41.211:4433...
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: none
*   Trying 89.58.41.211:4433...
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: none
*   Trying 89.58.41.211:4433...
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: none
*   Trying 89.58.41.211:4433...
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: none
*   Trying 89.58.41.211:4433...
*  CAfile: /etc/ssl/certs/ca-certificates.crt
[...]
curl3 -vv --http3-only https://curl-neqo.neon.rocks:4433 --insecure
* processing: https://curl-neqo.neon.rocks:4433
*   Trying 89.58.41.211:4433...
*   Trying 89.58.41.211:4433...
*   Trying 89.58.41.211:4433...
*   Trying 89.58.41.211:4433...
[...]

@icing
Copy link
Contributor

icing commented Nov 28, 2023

@mb could you try the current master? We have made changes in the retry behaviour when connecting to servers that immediately go into DRAINING state.

@bagder
Copy link
Member

bagder commented Nov 30, 2023

@mb ?

I can't repro that problem with --http3-only https://curl-neqo.neon.rocks:4433 just now:

$ curl --http3-only https://curl-neqo.neon.rocks:4433
curl: (8) Failed to connect to curl-neqo.neon.rocks port 4433 after 104 ms: Weird server reply

@mb
Copy link
Author

mb commented Dec 19, 2023

Hm, locally I still get the connection is draining error:

$ curl3 --version
curl 8.5.0 (x86_64-pc-linux-gnu) libcurl/8.5.0 BoringSSL zlib/1.3 brotli/1.1.0 zstd/1.5.5 libidn2/2.3.4 libpsl/0.21.2 (+libidn2/2.3.4) libssh2/1.11.0 nghttp2/1.58.0 quiche/0.20.0 librtmp/2.3
Release-Date: 2023-12-06
Protocols: dict file ftp ftps gopher gophers http https imap imaps mqtt pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS brotli GSS-API HSTS HTTP2 HTTP3 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM PSL SPNEGO SSL threadsafe UnixSockets zstd
$ curl3 https://127.0.0.1:60930 --http3-only --insecure -vv -i
*   Trying 127.0.0.1:60930...
* Connected to 127.0.0.1 (127.0.0.1) port 60930
* using HTTP/3
* [HTTP/3] [0] OPENED stream for https://127.0.0.1:60930/
* [HTTP/3] [0] [:method: GET]
* [HTTP/3] [0] [:scheme: https]
* [HTTP/3] [0] [:authority: 127.0.0.1:60930]
* [HTTP/3] [0] [:path: /]
* [HTTP/3] [0] [user-agent: curl/8.5.0]
* [HTTP/3] [0] [accept: */*]
> GET / HTTP/3
> Host: 127.0.0.1:60930
> User-Agent: curl/8.5.0
> Accept: */*
> 
* QUIC connection is draining
* Connection #0 to host 127.0.0.1 left intact
curl: (95) QUIC connection is draining

Sorry for long reply time. I can confirm that I get the same error you see on https://curl-neqo.neon.rocks:4433. That is weird. I have a hard time testing, because installing curl with http3 support on the server is harder than on my dev machine. It's probably the different hostname that produces the weird server reply. So the hosted test server might be useless, unless I can modify it to ignore the sent hostname.

@icing
Copy link
Contributor

icing commented Dec 20, 2023

You can send a URL to localhost with curl --resolve curl-nego.neon.rocks:4433:127.0.0.1. Then the hostname the server sees may work better for you.

@bagder
Copy link
Member

bagder commented May 27, 2024

@mb does this still happen?

@mb
Copy link
Author

mb commented May 31, 2024

I retried and yes it is still happening. I've enabled debug build now and it is panicing:

user@archlinux ~/g/curl (master) [SIGABRT]> RUST_BACKTRACE=1 rr record --disable-cpuid-features-ext 0xdc230000,0x2c42,0xc curl -I --http3-only https://127.0.0.1:40668 --insecure -vv
rr: Saving execution to trace directory `/home/user/.local/share/rr/curl-5'.
* !!! WARNING !!!
* This is a debug build of libcurl, do not use in production.
* STATE: INIT => SETUP handle 0x623ed35f9ca8; line 1987
* STATE: SETUP => CONNECT handle 0x623ed35f9ca8; line 2003
* Added connection 0. The cache now contains 1 members
* STATE: CONNECT => CONNECTING handle 0x623ed35f9ca8; line 2039
*   Trying 127.0.0.1:40668...
* Server certificate:
*  subject: CN= HTTP2 Test Cert
*  start date: Jan  1 00:00:00 2017 GMT
*  expire date: Jan  1 00:00:00 2027 GMT
*  issuer: CN= HTTP2 Test CA
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* Connected to 127.0.0.1 (127.0.0.1) port 40668
* using HTTP/3
* STATE: CONNECTING => PROTOCONNECT handle 0x623ed35f9ca8; line 2147
* STATE: PROTOCONNECT => DO handle 0x623ed35f9ca8; line 2176
* [HTTP/3] [0] OPENED stream for https://127.0.0.1:40668/
* [HTTP/3] [0] [:method: HEAD]
* [HTTP/3] [0] [:scheme: https]
* [HTTP/3] [0] [:authority: 127.0.0.1:40668]
* [HTTP/3] [0] [:path: /]
* [HTTP/3] [0] [user-agent: curl/8.8.1-DEV]
* [HTTP/3] [0] [accept: */*]
> HEAD / HTTP/3
> Host: 127.0.0.1:40668
> User-Agent: curl/8.8.1-DEV
> Accept: */*
> 
* STATE: DO => DID handle 0x623ed35f9ca8; line 2272
* STATE: DID => PERFORMING handle 0x623ed35f9ca8; line 2390
* QUIC connection is draining
* readwrite_data() -> 95
* Curl_readwrite() -> 95
* multi_done[PERFORMING]: status: 95 prem: 1 done: 0
* Connection #0 to host 127.0.0.1 left intact
* Expire cleared
curl: (95) QUIC connection is draining
thread '<unnamed>' panicked at library/core/src/panicking.rs:156:5:
unsafe precondition(s) violated: slice::from_raw_parts requires the pointer to be aligned and non-null, and the total size of the slice not to exceed `isize::MAX`
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_nounwind_fmt::runtime
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:110:18
   2: core::panicking::panic_nounwind_fmt
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:123:9
   3: core::panicking::panic_nounwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:156:5
   4: core::slice::raw::from_raw_parts::precondition_check
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/intrinsics.rs:2799:21
   5: core::slice::raw::from_raw_parts
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/slice/raw.rs:98:9
   6: quiche_conn_close
             at /home/user/git/quiche/quiche/src/ffi.rs:948:27
   7: cf_quiche_close
             at ./lib/vquic/curl_quiche.c:1474:13
   8: cf_he_close
             at ./lib/connect.c:943:5
   9: cf_setup_close
             at ./lib/connect.c:1297:5
  10: cf_hc_close
             at ./lib/cf-https-connect.c:415:5
  11: Curl_conn_close
             at ./lib/cfilters.c:167:5
  12: conn_shutdown
             at ./lib/url.c:567:3
  13: Curl_disconnect
             at ./lib/url.c:675:3
  14: Curl_conncache_close_all_connections
             at ./lib/conncache.c:532:5
  15: curl_share_cleanup
             at ./lib/share.c:226:3
  16: operate
             at ./src/tool_operate.c:2795:11
  17: main
             at ./src/tool_main.c:273:14
  18: <unknown>
  19: __libc_start_main
  20: _start
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
thread caused non-unwinding panic. aborting.
fish: Job 1, 'RUST_BACKTRACE=1 rr record --di…' terminated by signal SIGABRT (Abort)

I've do have an rr recording (Edit: doesn't seem enough to replay) that might help debugging and a related pernoco session. Unfortunatly I don't know if you can access the latter for debugging :(. And sorry for weird file ending. xz was one of the only ones reaching <25MB and .zip is one of the few supported file endings here on github.

Relevant line from panic:

unsafe precondition(s) violated: slice::from_raw_parts requires the pointer to be aligned and non-null, and the total size of the slice not to exceed `isize::MAX`

@mb
Copy link
Author

mb commented May 31, 2024

@bagder
Copy link
Member

bagder commented Jun 3, 2024

I recommend you using the ngtcp2 backend instead until someone has the time and energy to debug and fix this.

@bagder bagder closed this as completed in 4edbd52 Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

3 participants