Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve initial performance for ws #199

Open
DiegoRBaquero opened this issue Feb 3, 2017 · 16 comments
Open

Improve initial performance for ws #199

DiegoRBaquero opened this issue Feb 3, 2017 · 16 comments
Labels

Comments

@DiegoRBaquero
Copy link
Member

DiegoRBaquero commented Feb 3, 2017

  • Increase initial request numwant to 20,
  • Increase update numwat to 15

This would improve the initial connection time to other peers.

@DiegoRBaquero
Copy link
Member Author

@feross thoughts?

@feross
Copy link
Member

feross commented Feb 14, 2017

I was actually thinking of doing the opposite. WebTorrent Desktop has 100% CPU at startup with many torrents active because we need to create so many RTCPeerConnection objects (many of which are never used).

If you have 10 torrents, with 3 trackers each, that's 30 RTCPeerConnection objects created every few minutes. There's lots of heavyweight stuff happening under the hood in WebRTC unfortunately, so we really need to reduce this number, or do something more clever.

I think we need a way to ask the tracker for the peer_ids of the peers that are connected, so we can generate the exact right number of offers.

@DiegoRBaquero
Copy link
Member Author

300 offers would be generated immediately. That's a lot to start up. But it's something that could be managed in WebTorrent Desktop. I think that generating the peer objects doesn't consume lots of CPU, each. But when taking that to 300 in WD it's a different scenario. In that case we could delay initial discovery with escalation; having complete torrents as last.

When adding a new torrent, specially without full torrent and webseeds, the more offers sent => more answers received, initially. 3 trackers right now allow up to 30 initial offers. In a good swarm, things work perfectly. In a bad swarm, or one which peers are firewalled, not so much. Metadata takes time and delays initial render.

Sintel infoHash: https://direct.btorrent.xyz/view#6a9759bffd5c0af65319979fb7832189f4f3c35d
It loads fast and with the 30 initial offers (3 trackers, 10 offers each) it connects to up to 18 peers within 10 seconds. That's almost 2/3 of successful connections. Metadata within 4 seconds.

This infoHash (2nd most peers [12 to be exact] at time of commenting): https://btorrent.xyz/view#42e8de7cf73b3d84a359f0372d454ea04b62bca6. It loads VERY slow. And connects to 2-3 peers within 10 seconds. Up to 5 within 30 seconds. 7 within 2 minutes. Metadata within 7 seconds (Almost twice)

@chr15m
Copy link

chr15m commented Aug 20, 2018

@feross @DiegoRBaquero could the following be a way to improve start up performance?

I noticed that the trickle option is set to false in the client here:

https://github.com/webtorrent/bittorrent-tracker/blob/master/lib/client/websocket-tracker.js#L419

In the simple-peer documentation it says the following about trickle:

set to false to disable trickle ICE and get a single 'signal' event (slower)

In my own testing with simple-peer I found that I get a useful RTC candidate immediately if I set this option to true whereas it takes up to 45 seconds (!) to get a useable RTC candidate on some networks where I tested if this option is set to false (as it is in this package and therefore webtorrent).

When it was set to true I noticed that I had to filter for useable results like this:

p.on('signal', function (data) {
  if (data.type == "offer" || data.type == "answer") {
    // do something useful with the offer/answer
  }
})

Would you accept a patch here that allowed the user to set trickle to true and wrapped the on('signal') code in this package to perform the above filtering? I think it would make connect time much faster for people on some networks.

@DiegoRBaquero
Copy link
Member Author

@chr15m Nice. I'm wondering, what type of signal do we receive when set to true that are "useless"?

@chr15m
Copy link

chr15m commented Aug 21, 2018

@DiegoRBaquero when trickle: true you quickly see a signal like this:

for offering client:

{"type":"offer","sdp":"v=0\r\no=- 1057...00 webrtc-datachannel 1024\r\n"}

for answering client:

{"type":"answer","sdp":"v=0\r\no=mozilla...THIS_IS_SDPARTA-61.0.1 39 ... tpmap:5000 webrtc-datachannel 256\r\na=setup:active\r\na=max-message-size:1073741823\r\n"}

Then signals like this follow afterwards:

{"candidate":{"candidate":"candidate:225 ....","sdpMLineIndex":0,"sdpMid":"data"}}
{"candidate":{"candidate":"candidate:842163...ork-cost 50","sdpMLineIndex":0,"sdpMid":"data"}}

I don't think these candidates can't be used by WebTorrent when doing the websocket signalling introduction.

I wonder if there's some good reason why trickle is set to false in the bittorrent-tracker implementation? I wonder if it would be possible to send up the trickle: true first candidate and then if better candidates come along use those instead at a later time? That way if somebody has a good candidate immediately it gets used and the connection is fast, but if their connection is complicated then they later get a better candidate. I think for most people that would speed up connection time.

I saw in the git blame that @feross wrote that code which sets trickle to false by default so he will probably know best why it's done that way.

@chr15m
Copy link

chr15m commented Aug 21, 2018

Looking back through the git history I see that the first implementation already had trickle set to false:

peer = new Peer({ trickle: false, config: self._opts.rtcConfig })

An idea I had was to start some simple-peer instances with both of each trickle setting. So half of each for example. This means the fast but perhaps less reliable trickle: true can connect quickly if available and then the slower more reliable trickle: false peers are available if the fast trickle version does not manage to establish any connections.

@chr15m
Copy link

chr15m commented Aug 22, 2018

After running some tests with simple-peer I think the trickle setting is false for a good reason. The initial offer that comes in immediately does not seem to have useable candidates whereas the one that comes in after waiting does so. It's plausible that each candidate could be iteratively added with the trickle setting bug I guess that would complicate the code quite a bit.

@DiegoRBaquero
Copy link
Member Author

Maybe we could do 80% trickle in false, 20% trickle in true then. There MIGHT be some usable trickle ice candidates and improve connectivity when there's lots of peers?

@chr15m
Copy link

chr15m commented Aug 22, 2018

@DiegoRBaquero probably best to gather some data on different devices & networks and see if this ever actually happens before making such a change. I think on good networks trickle=false will return quickly anyway.

@chr15m
Copy link

chr15m commented Sep 1, 2018

@DiegoRBaquero I did some more testing and I think we can get significantly better connection time performance for some clients without too many changes.

I noticed that on my office network initial WebTorrent announce time is very slow, usually incurring a 45 second delay between seed/add & first peers seen. I built a little test page with simple-peer and logged the following:

trickle = true

18:52:52.097 main.js:2 load
18:52:52.114 main.js:16 SIGNAL {"type":"offer","sdp":"v=0\r\no=- 2086423642829219872 2 IN IP4 127.0.0.1\r\ns=-\r\nt=0 0\r\na=group:BUNDLE data\r\na=msid-semantic: WMS\r\nm=application 37715 DTLS/SCTP 5000\r\nc=IN IP4 192.168.1.39\r\na=candidate:4106396162 1 udp 2113937151 192.168.1.39 37715 typ host generation 0 network-cost 50\r\na=candidate:2781808999 1 udp 2113942271 fd80::c521:741a:86da:fd93 36123 typ host generation 0 network-cost 50\r\na=ice-ufrag:rNjA\r\na=ice-pwd:1AOeSW/lKyDHArwJpIdVrSx0\r\na=ice-options:trickle\r\na=fingerprint:sha-256 75:76:82:B7:13:76:84:30:20:5B:2A:D5:C0:4E:2E:F2:97:D4:C3:0B:8B:1F:26:FE:EB:AD:41:75:DB:C5:5F:3E\r\na=setup:actpass\r\na=mid:data\r\na=sctpmap:5000 webrtc-datachannel 1024\r\n"}
18:52:52.115 main.js:25 iceStateChange Arguments(2) ["new", "gathering", callee: ƒ, Symbol(Symbol.iterator): ƒ]
18:52:52.118 main.js:16 SIGNAL {"candidate":{"candidate":"candidate:4106396162 1 udp 2113937151 192.168.1.39 37715 typ host generation 0 ufrag rNjA network-cost 50","sdpMLineIndex":0,"sdpMid":"data"}}
18:52:52.118 main.js:20 Skipping signal: {candidate: {…}}
18:52:52.119 main.js:16 SIGNAL {"candidate":{"candidate":"candidate:2781808999 1 udp 2113942271 fd80::c521:741a:86da:fd93 36123 typ host generation 0 ufrag rNjA network-cost 50","sdpMLineIndex":0,"sdpMid":"data"}}
18:52:52.119 main.js:20 Skipping signal: {candidate: {…}}
18:52:52.382 main.js:16 SIGNAL {"candidate":{"candidate":"candidate:842163049 1 udp 1677729535 202.89.169.99 37715 typ srflx raddr 192.168.1.39 rport 37715 generation 0 ufrag rNjA network-cost 50","sdpMLineIndex":0,"sdpMid":"data"}}
18:52:52.382 main.js:20 Skipping signal: {candidate: {…}}

NOTE: 45 second delay here

18:53:37.176 main.js:25 iceStateChange Arguments(2) ["new", "complete", callee: ƒ, Symbol(Symbol.iterator): ƒ]
18:53:37.177 main.js:33 _iceComplete Arguments [callee: ƒ, Symbol(Symbol.iterator): ƒ]

trickle = false (WebTorrent default)

18:54:46.142 main.js:2 load
18:54:46.158 main.js:25 iceStateChange Arguments(2) ["new", "gathering", callee: ƒ, Symbol(Symbol.iterator): ƒ]

NOTE: 39 second delay here

18:55:25.915 main.js:25 iceStateChange Arguments(2) ["new", "complete", callee: ƒ, Symbol(Symbol.iterator): ƒ]
18:55:25.916 main.js:33 _iceComplete Arguments [callee: ƒ, Symbol(Symbol.iterator): ƒ]
18:55:25.917 main.js:16 SIGNAL {"type":"offer","sdp":"v=0\r\no=- 1921780624448669464 2 IN IP4 127.0.0.1\r\ns=-\r\nt=0 0\r\na=group:BUNDLE data\r\na=msid-semantic: WMS\r\nm=application 58823 DTLS/SCTP 5000\r\nc=IN IP4 202.89.169.99\r\na=candidate:4106396162 1 udp 2113937151 192.168.1.39 58823 typ host generation 0 network-cost 50\r\na=candidate:2781808999 1 udp 2113942271 fd80::c521:741a:86da:fd93 60228 typ host generation 0 network-cost 50\r\na=candidate:842163049 1 udp 1677729535 202.89.169.99 58823 typ srflx raddr 192.168.1.39 rport 58823 generation 0 network-cost 50\r\na=ice-ufrag:Z/LX\r\na=ice-pwd:/tC+qocX/aYZo0Vup3ehmx8P\r\na=ice-options:trickle\r\na=fingerprint:sha-256 7A:18:97:06:CA:FD:2D:C1:76:F5:83:EE:E5:D1:FB:E1:5F:A5:D2:CB:C3:4E:8B:2B:07:4B:DC:D1:D3:74:13:D5\r\na=setup:actpass\r\na=mid:data\r\na=sctpmap:5000 webrtc-datachannel 1024\r\n"}

Suggestions

Note that in the first example with trickle = true all of the candidates that are in the final offer are received within 300ms of the request for the offer. So basically the remaining 45 seconds are wasted waiting for the iceStateChange = complete and _iceComplete.

So possible solution:

  • Turn on trickle.
  • For each candidate generate a valid offer and send it up.
  • Have some debounce so that if e.g. all the candidates return at the start (almost always the case) then the offer that gets sent up after 500ms contains all of the candidates.

chr15m added a commit to chr15m/simple-peer that referenced this issue Sep 2, 2018
On some networks the iceStateChange=complete will not fire until the
browser times out adding tens of seconds of latency even though all
valid candidates have been recieved. This patch makes _iceComplete fire
early and current offer return after some timeout if we've received
valid candidates.

See webtorrent/bittorrent-tracker#199 for further discussion.
@chr15m
Copy link

chr15m commented Sep 2, 2018

@DiegoRBaquero I've filed a pull-request against simple-peer which fixes the above issue at that layer.

Testing on my "bad" office network where I was getting 45 second delays before WebTorrent peer connections it's now reduced to 5 seconds (the default simple-peer offer timeout setting in that patch).

If @feross thinks it should not go into simple-peer I have a patch ready against this repository instead which will implement the fix in bittorrent-tracker/lib/client/websocket-tracker.js instead of simple-peer.

Thanks for your time and sorry for all the noise!

chr15m added a commit to chr15m/simple-peer that referenced this issue Sep 11, 2018
On some networks the iceStateChange=complete will not fire until the
browser times out adding tens of seconds of latency even though all
valid candidates have been recieved. This patch makes _iceComplete fire
early and current offer return after some timeout if we've received
valid candidates.

See webtorrent/bittorrent-tracker#199 for further discussion.
@feross
Copy link
Member

feross commented Aug 11, 2019

Hi everyone,

The reason that I initially wrote this to use { trickle: false } is to minimize the amount of traffic to the tracker servers. UDP trackers servers can handle millions of concurrent peers because they only have to handle one-off messages, once every 15 minutes. Obviously, an approach that uses long-lived WebSocket connections and two messages for every peer connection will never be able to scale to the same degree. However, I didn't think that it made sense to add even more messages on top of the minimum offer/answer messages that must be sent.

I'm open to reconsidering this decision, but we'd need to make sure that the server load on trackers is reasonable since likely 10x as many messages would need to get sent, though they would be smaller messages.

But I think that @chr15m's solution in simple-peer solves the main issue, which was 45 second connection times on some networks. Now in the worst case, it takes 5 seconds. Setting {trickle: true} would make connections on those networks faster than 5 seconds, but I'm not sure that's really needed anymore since this was really an edge case to begin with.

@chr15m
Copy link

chr15m commented Aug 11, 2019

@feross thanks you, yes, very much agree.

I was actually thinking of doing the opposite.

I also agree with this. Reducing the number of listening-but-unused WebRTC connections is good. In my testing I've found that Firefox is fine with a huge number of WebRTC offers, but Chrome gets very sad because of this bug: https://bugs.chromium.org/p/chromium/issues/detail?id=825576

@feross
Copy link
Member

feross commented Aug 11, 2019

Reducing the number of listening-but-unused WebRTC connections is good

Yep, I see the Chrome errors about this issue all the time now.

@alxhotel
Copy link
Member

I think we need a way to ask the tracker for the peer_ids of the peers that are connected, so we can generate the exact right number of offers.

Since there is a limit in the number of webrtc connections, I think this is a good approach. By adding a new type of message (in the action attribute), I think it can be made retrocompatible.

FredZeng pushed a commit to FredZeng/simple-peer that referenced this issue Oct 14, 2023
On some networks the iceStateChange=complete will not fire until the
browser times out adding tens of seconds of latency even though all
valid candidates have been recieved. This patch makes _iceComplete fire
early and current offer return after some timeout if we've received
valid candidates.

See webtorrent/bittorrent-tracker#199 for further discussion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants