Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Queue events does not always capture all completed events #2517

Open
1 task done
bobthekingofegypt opened this issue Apr 10, 2024 · 2 comments
Open
1 task done
Labels
bug Something isn't working

Comments

@bobthekingofegypt
Copy link

Version

v5.7.1

Platform

NodeJS

What happened?

We recently updated from the old bull to bullmq in a legacy project.
With the old bull we used to do some strange wrapping and monitoring of this legacy project from our newer orchestration tool, the tool used the completed callbacks from the queues to understand when this legacy stage was finished processing everything. This always worked fine. After switching over to bullmq we have noticed that the orchestrator is not understanding that the queue work is complete. We found bull had processed all the jobs fine, they have been saved of to our database no problem but the orchestrator process is always waiting for a few callbacks so that the submitted event count equals the completed callback count. But those callbacks never arrive. Not sure if we have done something wrong, or if this is expected behaviour.

In summary:

  • lots of workers doing the work
  • single process monitoring queue for completed event, submits X events, waits till X == completed callback count to stop the workers

How to reproduce.

https://github.com/bobthekingofegypt/check_bull_complete_count

I uploaded this repo as a minimum test. It contains a monitor, consumer and producer. The monitor listens for the complete events, node:cluster starts up loads of workers and producer submits 1million events. But the monitor doesn't always seem to register 1million callbacks. I'm testing this on a 20 core machine.

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@bobthekingofegypt bobthekingofegypt added the bug Something isn't working label Apr 10, 2024
@manast
Copy link
Contributor

manast commented Apr 10, 2024

I think it is possible this has to do with the max events length, as you are processing very quickly, the Redis stream holding the events may get trimmed before the queue events manage to read the events, you can try to increase this setting to a larger value to see if it improves. https://api.docs.bullmq.io/interfaces/v5.QueueOptions.html#streams Default is 10k, you could try with 100k instead.

@bobthekingofegypt
Copy link
Author

Tried it with 100k, sadly no difference.

I originally had the reproducible test case running with random sleeps to more match our production machines throughput but when I saw the same issue without them I just removed them for simplicity. Our production machines don't consume events super quickly; the completion monitor is attached to the end queue of a stream of processors. That queue has the task of saving to postgres, so it's throughput isn't very high.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants