Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to ensure ordering of jobs in case of delayed retries. #68

Open
ksyd9821 opened this issue Mar 4, 2024 · 7 comments
Open

How to ensure ordering of jobs in case of delayed retries. #68

ksyd9821 opened this issue Mar 4, 2024 · 7 comments

Comments

@ksyd9821
Copy link

ksyd9821 commented Mar 4, 2024

Hello,

for our current use case, we are adding jobs to a queue and would like them to be successfully processed in the same order they were added (FIFO).
So when a job fails, the desired outcome would be that the job is retried automatically after some delay before moving on to the next job on the queue.

For example, if we have these 3 jobs in our queue [1, 2, 3] with job 1 being the first added job, here is what a possible execution would look like:

  • process job 1
  • job 1 fails
  • X amount of delay
  • process job 1 again
  • job 1 completed
  • process job 2
  • process job 3

How can we achieve this with bullmq?

Thank you in advance for the support!

@manast
Copy link
Contributor

manast commented Mar 4, 2024

Thank you for your question. I am afraid that currently, when a job fails, the queue is not halted, so the other jobs waiting to be processed will be processed as soon as a worker is free.
How critical is this case for you? can you develop a bit more the whole scenario where this functionality is needed?

@hardcodet
Copy link

hardcodet commented Mar 4, 2024

@manast

How critical is this case for you? can you develop a bit more the whole scenario where this functionality is needed?

It is critical, unfortunately. Our use case is a number of event queues for webhooks (each queue representing a customer's subscription), where we would like to submit events in proper order. We see that in practice, webhooks are sometimes not working (e.g. the customer's endpoint is temporary available) and need to be retried, but we can't have those events move to the back of the queue, because order matters.

As a dummy example: imagine two events occurring in this order:

  1. The system is offline
  2. The system is online

If we would send these events in inverted order, the outcome on the customer end would be completely wrong, since they assumed the system is offline, and might cease communication to it.

@manast
Copy link
Contributor

manast commented Mar 5, 2024

Ok, so this function would be specific for groups, where a group would not continue processing new jobs until the previous one have been completely completed or failed, furthermore this feature would only make sense with concurrency equal 1.
We need to study to see how feasible this feature is in current design.

@hardcodet
Copy link

You're right. We're already using concurrency of 1 extensively to enforce sequential processing because there's a lot of cases for us that warrant that. Preserving order on retries is just one flavor more.

If it's a bad fit for BullMQ, we could work around the issue with the following strategy I guess:

  • handle the error ourselves, and
    • pause the queue
    • mark the failed job as completed
    • create a new job with the same payload and enqueue it LIFO
  • re-enable the queue after the retry delay

This is absolutely feasible for us. We just figured that ordered processing (including retries with backoff delays) would be a common scenario, so we wanted to discuss this with you first 👍

@hardcodet
Copy link

It wouldn't be specific for groups though: we thought about creating a queue for each customer (rather than groups with a customer ID), which would reduce the complexity for the retries remarkably compared to queues that still would have to process events for other groups.

@manast
Copy link
Contributor

manast commented Mar 11, 2024

We are working on a solution for this in BullMQ and then we will extend it to groups as well, this is the PR: taskforcesh/bullmq#2465

@hardcodet
Copy link

You guys rock! Looking forward to the implementation :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants