-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDU to Tagged Stream confuses the scheduler by failing to produce output #6628
Comments
Various other tests involving message passing were failing in the past, and #5754 was intended to fix that. It may be worth looking over the fix to see whether it properly handles the flow graphs in the FEC async tests. |
Some observations:
|
It turns out there are two separate bugs here:
I'll reopen #7022 to track bug 2. |
I was able to track down the bug after adding some debug logging. Here's what goes wrong:
Prior to #323, the PDU to Tagged Stream block would output partial PDUs. I think we should restore this behaviour so that run-to-completion flow graphs will finish properly. |
Unfortunately, it's the It seems likely that other Tagged Stream Blocks could confuse the scheduler in a similar way when used in run-to-completion flow graphs. |
FWIW, this exact issue caused me to immediately abandon tagged stream blocks back in the start of my PDU work. As you have identified its a property of TSBs, which as expounded in #1446 are inherently evil. |
I'm starting to see that! They make unreasonable demands of the scheduler, so it's not a great surprise that it responds badly. Maybe the right path forward for the FECAPI tests is to rewrite them to avoid the "Tagged Stream to PDU" and "PDU to Tagged Stream" blocks. If that turns out to be impractical, then I suppose it would be possible to make the "PDU to Tagged Stream" block behave by setting its output multiple to the FEC frame size. |
On second thought, rewriting tests to avoid these blocks wouldn't fix the root cause. The "Tagged Stream to PDU" and "PDU to Tagged Stream" blocks don't need to use the scheduler-hostile |
A CI run (https://github.com/gnuradio/gnuradio/actions/runs/4651656779/jobs/8231524712) recently failed with the following error:
I was able to reproduce this on my laptop by running the test in a loop (
while gr-fec/python/fec/qa_fecapi_dummy_test.sh; do :; done
) in parallel with a heavy CPU load (stress -c 16
).Debian build logs show that qa_fecapi_repetition fails on i386:https://buildd.debian.org/status/fetch.php?pkg=gnuradio&arch=i386&ver=3.10.9.0%7Erc1-2&stamp=1703091669&raw=0https://buildd.debian.org/status/fetch.php?pkg=gnuradio&arch=i386&ver=3.10.9.0%7Erc1-3&stamp=1703129935&raw=0Again, I was able to reproduce the failure by repeatedly executing the test in parallel with a heavy CPU load.The failing tests use
_qa_helper_async
to build a flow graph containing FEC Async Encoder and FEC Async Decoder blocks. The flow graph alternates between stream and message connections. I suspect that the scheduler is deciding that the flow graph is "done" even though there are still messages in flight.The text was updated successfully, but these errors were encountered: