How to ensure ordering of jobs in case of delayed retries. #68

ksyd9821 · 2024-03-04T18:22:37Z

Hello,

for our current use case, we are adding jobs to a queue and would like them to be successfully processed in the same order they were added (FIFO).
So when a job fails, the desired outcome would be that the job is retried automatically after some delay before moving on to the next job on the queue.

For example, if we have these 3 jobs in our queue [1, 2, 3] with job 1 being the first added job, here is what a possible execution would look like:

process job 1
job 1 fails
X amount of delay
process job 1 again
job 1 completed
process job 2
process job 3

How can we achieve this with bullmq?

Thank you in advance for the support!

manast · 2024-03-04T22:17:39Z

Thank you for your question. I am afraid that currently, when a job fails, the queue is not halted, so the other jobs waiting to be processed will be processed as soon as a worker is free.
How critical is this case for you? can you develop a bit more the whole scenario where this functionality is needed?

hardcodet · 2024-03-04T22:48:17Z

@manast

How critical is this case for you? can you develop a bit more the whole scenario where this functionality is needed?

It is critical, unfortunately. Our use case is a number of event queues for webhooks (each queue representing a customer's subscription), where we would like to submit events in proper order. We see that in practice, webhooks are sometimes not working (e.g. the customer's endpoint is temporary available) and need to be retried, but we can't have those events move to the back of the queue, because order matters.

As a dummy example: imagine two events occurring in this order:

The system is offline
The system is online

If we would send these events in inverted order, the outcome on the customer end would be completely wrong, since they assumed the system is offline, and might cease communication to it.

manast · 2024-03-05T21:38:33Z

Ok, so this function would be specific for groups, where a group would not continue processing new jobs until the previous one have been completely completed or failed, furthermore this feature would only make sense with concurrency equal 1.
We need to study to see how feasible this feature is in current design.

hardcodet · 2024-03-05T22:09:51Z

You're right. We're already using concurrency of 1 extensively to enforce sequential processing because there's a lot of cases for us that warrant that. Preserving order on retries is just one flavor more.

If it's a bad fit for BullMQ, we could work around the issue with the following strategy I guess:

handle the error ourselves, and
- pause the queue
- mark the failed job as completed
- create a new job with the same payload and enqueue it LIFO
re-enable the queue after the retry delay

This is absolutely feasible for us. We just figured that ordered processing (including retries with backoff delays) would be a common scenario, so we wanted to discuss this with you first 👍

hardcodet · 2024-03-05T23:00:25Z

It wouldn't be specific for groups though: we thought about creating a queue for each customer (rather than groups with a customer ID), which would reduce the complexity for the retries remarkably compared to queues that still would have to process events for other groups.

manast · 2024-03-11T09:22:05Z

We are working on a solution for this in BullMQ and then we will extend it to groups as well, this is the PR: taskforcesh/bullmq#2465

hardcodet · 2024-03-14T08:53:12Z

You guys rock! Looking forward to the implementation :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to ensure ordering of jobs in case of delayed retries. #68

How to ensure ordering of jobs in case of delayed retries. #68

ksyd9821 commented Mar 4, 2024

manast commented Mar 4, 2024

hardcodet commented Mar 4, 2024 •

edited

manast commented Mar 5, 2024

hardcodet commented Mar 5, 2024

hardcodet commented Mar 5, 2024

manast commented Mar 11, 2024

hardcodet commented Mar 14, 2024

How to ensure ordering of jobs in case of delayed retries. #68

How to ensure ordering of jobs in case of delayed retries. #68

Comments

ksyd9821 commented Mar 4, 2024

manast commented Mar 4, 2024

hardcodet commented Mar 4, 2024 • edited

manast commented Mar 5, 2024

hardcodet commented Mar 5, 2024

hardcodet commented Mar 5, 2024

manast commented Mar 11, 2024

hardcodet commented Mar 14, 2024

hardcodet commented Mar 4, 2024 •

edited