Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logs show: WARN frame executor queue is backlogged #11920

Open
4 of 11 tasks
stuberman opened this issue Apr 23, 2024 · 8 comments
Open
4 of 11 tasks

Logs show: WARN frame executor queue is backlogged #11920

stuberman opened this issue Apr 23, 2024 · 8 comments
Assignees
Labels
kind/bug Kind: Bug

Comments

@stuberman
Copy link

Checklist

  • This is not a security-related bug/issue. If it is, please follow please follow the security policy.
  • I have searched on the issue tracker and the lotus forum, and there is no existing related issue or discussion.
  • I am running the Latest release, the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.
  • I did not make any code changes to lotus.

Lotus component

  • lotus daemon - chain sync
  • lotus fvm/fevm - Lotus FVM and FEVM interactions
  • lotus miner/worker - sealing
  • lotus miner - proving(WindowPoSt/WinningPoSt)
  • lotus JSON-RPC API
  • lotus message management (mpool)
  • Other

Lotus Version

Daemon: 1.26.3+mainnet+git.560826d5c+api1.5.0
Local: lotus-miner version 1.26.3+mainnet+git.560826d5c

rustc 1.77.2 (25ef9e3d8 2024-04-09)
rustup 1.27.0 (bbb9276d2 2024-03-08)
go version go1.21.9 linux/amd64

free -h
              total        used        free      shared  buff/cache   available
Mem:          125Gi        43Gi       864Mi        10Mi        81Gi        81Gi
Swap:         199Gi        73Mi       199Gi

Repro Steps

  1. Start Lotus-miner
  2. See logs

Describe the Bug

After upgrading to lotus v1.26.3 from 1.25 seeing these WARN errors in miner logs continuously

Logging Information

2024-04-23T19:16:32.188Z	WARN	rpc	go-jsonrpc@v0.3.1/websocket.go:676	frame executor queue is backlogged	{"queued": 129, "cap": 256}
2024-04-23T19:16:32.282Z	WARN	rpc	go-jsonrpc@v0.3.1/websocket.go:676	frame executor queue is backlogged	{"queued": 129, "cap": 256}
2024-04-23T19:16:32.282Z	WARN	rpc	go-jsonrpc@v0.3.1/websocket.go:676	frame executor queue is backlogged	{"queued": 129, "cap": 256}
2024-04-23T19:16:32.282Z	WARN	rpc	go-jsonrpc@v0.3.1/websocket.go:676	frame executor queue is backlogged	{"queued": 129, "cap": 256}
2024-04-23T19:16:32.282Z	WARN	rpc	go-jsonrpc@v0.3.1/websocket.go:676	frame executor queue is backlogged	{"queued": 130, "cap": 256}
2024-04-23T19:16:32.282Z	WARN	rpc	go-jsonrpc@v0.3.1/websocket.go:676	frame executor queue is backlogged	{"queued": 131, "cap": 256}
2024-04-23T19:16:32.282Z	WARN	rpc	go-jsonrpc@v0.3.1/websocket.go:676	frame executor queue is backlogged	{"queued": 132, "cap": 256}
2024-04-23T19:16:32.282Z	WARN	rpc	go-jsonrpc@v0.3.1/websocket.go:676	frame executor queue is backlogged	{"queued": 133, "cap": 256}
2024-04-23T19:16:32.282Z	WARN	rpc	go-jsonrpc@v0.3.1/websocket.go:676	frame executor queue is backlogged	{"queued": 134, "cap": 256}
2024-04-23T19:16:32.282Z	WARN	rpc	go-jsonrpc@v0.3.1/websocket.go:676	frame executor queue is backlogged	{"queued": 134, "cap": 256}
2024-04-23T19:16:32.282Z	WARN	rpc	go-jsonrpc@v0.3.1/websocket.go:676	frame executor queue is backlogged	{"queued": 132, "cap": 256}
2024-04-23T19:16:32.282Z	WARN	rpc	go-jsonrpc@v0.3.1/websocket.go:676	frame executor queue is backlogged	{"queued": 130, "cap": 256}
2024-04-23T19:16:32.282Z	WARN	rpc	go-jsonrpc@v0.3.1/websocket.go:676	frame executor queue is backlogged	{"queued": 129, "cap": 256}
2024-04-23T19:16:32.282Z	WARN	rpc	go-jsonrpc@v0.3.1/websocket.go:676	frame executor queue is backlogged	{"queued": 129, "cap": 256}
2024-04-23T19:16:32.282Z	WARN	rpc	go-jsonrpc@v0.3.1/websocket.go:676	frame executor queue is backlogged	{"queued": 130, "cap": 256}
2024-04-23T19:16:32.282Z	WARN	rpc	go-jsonrpc@v0.3.1/websocket.go:676	frame executor queue is backlogged	{"queued": 131, "cap": 256}
2024-04-23T19:16:32.282Z	WARN	rpc	go-jsonrpc@v0.3.1/websocket.go:676	frame executor queue is backlogged	{"queued": 132, "cap": 256}
2024-04-23T19:16:32.282Z	WARN	rpc	go-jsonrpc@v0.3.1/websocket.go:676	frame executor queue is backlogged	{"queued": 133, "cap": 256}
2024-04-23T19:16:32.282Z	WARN	rpc	go-jsonrpc@v0.3.1/websocket.go:676	frame executor queue is backlogged	{"queued": 134, "cap": 256}
2024-04-23T19:16:32.282Z	WARN	rpc	go-jsonrpc@v0.3.1/websocket.go:676	frame executor queue is backlogged	{"queued": 134, "cap": 256}
2024-04-23T19:16:32.282Z	WARN	rpc	go-jsonrpc@v0.3.1/websocket.go:676	frame executor queue is backlogged	{"queued": 131, "cap": 256}
2024-04-23T19:16:32.282Z	WARN	rpc	go-jsonrpc@v0.3.1/websocket.go:676	frame executor queue is backlogged	{"queued": 129, "cap": 256}
@stuberman stuberman added the kind/bug Kind: Bug label Apr 23, 2024
@rjan90 rjan90 self-assigned this Jun 4, 2024
@rvagg
Copy link
Member

rvagg commented Jun 4, 2024

@stuberman @TippyFlitsUK does this mainly happen on startup, or is this a constant thing? Is there a particular time or focal event that this happens around, or just all the time? ISTM that the backlog is caused by the client not processing the messages as quick as the server sends them, but maybe that's just a quick burst of activity focused around startup or some other event?

@stuberman
Copy link
Author

@rvagg this is happening all of the time, it is constant.

2024-06-04T17:25:19.958Z WARN rpc go-jsonrpc@v0.3.1/websocket.go:676 frame executor queue is backlogged {"queued": 136, "cap": 256}
2024-06-04T17:25:19.958Z WARN rpc go-jsonrpc@v0.3.1/websocket.go:676 frame executor queue is backlogged {"queued": 137, "cap": 256}
2024-06-04T17:25:19.958Z WARN rpc go-jsonrpc@v0.3.1/websocket.go:676 frame executor queue is backlogged {"queued": 138, "cap": 256}
2024-06-04T17:25:19.958Z WARN rpc go-jsonrpc@v0.3.1/websocket.go:676 frame executor queue is backlogged {"queued": 139, "cap": 256}

@rvagg
Copy link
Member

rvagg commented Jun 5, 2024

I've put up an experiment that might help with debugging this @ filecoin-project/go-jsonrpc#113, I don't really want to push that through to a release yet but I'd really like to see the output.

If someone could run their node/miner (whatever is producing these logs) with this version installed and then share logs it might be useful.

To install it, in a lotus repo checkout, on the tag corresponding to the version you're running, do this:

go get github.com/filecoin-project/go-jsonrpc@9e83a043c6959e53e8521d7d48c7c80e26da5695
go mod tidy

and rebuild

@rjan90 rjan90 removed their assignment Jun 5, 2024
@stuberman
Copy link
Author

I will give this a shot after my PoSTs later this morning.

@stuberman
Copy link
Author

Not seeing the rpc warning now.

git show

commit 2baca01 (HEAD, tag: v1.27.0, origin/releases)
Merge: 560826d 60a39dd
Author: Phi-rjan orjan.roren@gmail.com
Date: Tue May 28 14:17:05 2024 +0200

Merge pull request #12058 from filecoin-project/resolve-releasev1270-conflicts

chore: merge release/v1.27.0 branch into the releases branch

lotus-miner version

Daemon: 1.27.0+mainnet+git.2baca01fd.dirty+api1.5.0
Local: lotus-miner version 1.27.0+mainnet+git.2baca01fd.dirty

2024-06-05T13:04:09.076Z INFO miner miner/miner.go:505 completed mineOne {"tookMilliseconds": 11, "forRound": 3976209, "baseEpoch": 3976208, "baseDeltaSeconds": 9, "nullRounds": 0, "lateStart": false, "beaconEpoch": 8263092, "lookbackEpochs": 900, "networkPowerAtLookback": "26217018616095277056", "minerPowerAtLookback": "1372112684056576", "isEligible": true, "isWinner": false, "error": null}
2024-06-05T13:04:39.079Z INFO miner miner/miner.go:505 completed mineOne {"tookMilliseconds": 12, "forRound": 3976210, "baseEpoch": 3976209, "baseDeltaSeconds": 9, "nullRounds": 0, "lateStart": false, "beaconEpoch": 8263102, "lookbackEpochs": 900, "networkPowerAtLookback": "26217161243369242624", "minerPowerAtLookback": "1372112684056576", "isEligible": true, "isWinner": false, "error": null}
2024-06-05T13:05:09.337Z INFO miner miner/miner.go:505 completed mineOne {"tookMilliseconds": 11, "forRound": 3976211, "baseEpoch": 3976210, "baseDeltaSeconds": 9, "nullRounds": 0, "lateStart": false, "beaconEpoch": 8263112, "lookbackEpochs": 900, "networkPowerAtLookback": "26217161243369242624", "minerPowerAtLookback": "1372112684056576", "isEligible": true, "isWinner": false, "error": null}
2024-06-05T13:05:39.448Z INFO miner miner/miner.go:505 completed mineOne {"tookMilliseconds": 12, "forRound": 3976212, "baseEpoch": 3976211, "baseDeltaSeconds": 9, "nullRounds": 0, "lateStart": false, "beaconEpoch": 8263122, "lookbackEpochs": 900, "networkPowerAtLookback": "26217155745811103744", "minerPowerAtLookback": "1372112684056576", "isEligible": true, "isWinner": false, "error": null}
2024-06-05T13:06:09.310Z INFO miner miner/miner.go:505 completed mineOne {"tookMilliseconds": 11, "forRound": 3976213, "baseEpoch": 3976212, "baseDeltaSeconds": 9, "nullRounds": 0, "lateStart": false, "beaconEpoch": 8263132, "lookbackEpochs": 900, "networkPowerAtLookback": "26217178079641042944", "minerPowerAtLookback": "1372112684056576", "isEligible": true, "isWinner": false, "error": null}
2024-06-05T13:06:39.314Z INFO miner miner/miner.go:505 completed mineOne {"tookMilliseconds": 9, "forRound": 3976214, "baseEpoch": 3976213, "baseDeltaSeconds": 9, "nullRounds": 0, "lateStart": false, "beaconEpoch": 8263142, "lookbackEpochs": 900, "networkPowerAtLookback": "26217178079641042944", "minerPowerAtLookback": "1372112684056576", "isEligible": true, "isWinner": false, "error": null}
2024-06-05T13:07:09.316Z INFO miner miner/miner.go:505 completed mineOne {"tookMilliseconds": 12, "forRound": 3976215, "baseEpoch": 3976214, "baseDeltaSeconds": 9, "nullRounds": 0, "lateStart": false, "beaconEpoch": 8263152, "lookbackEpochs": 900,

@stuberman
Copy link
Author

Here are all the events since I modified lotus 11 hours ago. Far less frequent.

cat /pool/logs/miner.log |grep backlogged
f01278.txt

@rvagg
Copy link
Member

rvagg commented Jun 6, 2024

hah, well that's very interesting, thanks @stuberman!

The primary reason for the lack of warnings is that I bumped the warning threshold from 1/2 to 2/3rds, which suggests that it's normally clearing them in good time, which is good, and means this warning is probably unnecessarily noisy. But perhaps also moving some of the processing earlier in the pipeline helps too.

The most interesting thing is the method<> in there, for all of them! I need to work out what that means, I'm obviously not clear on the details of the chatter but I'd assumed that each frame needed a method but I guess there's a bunch of meta frames in there I need to grok. There might be some unnecessary chatter going on. I'll investigate.

@rvagg rvagg self-assigned this Jun 6, 2024
@magik6k
Copy link
Contributor

magik6k commented Jun 6, 2024

Empty method likely means call result

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Kind: Bug
Projects
None yet
Development

No branches or pull requests

4 participants