Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DurableTask-AzureStorage] Eternal Orchestration Stuck and Consistently Abandoning the Message #1019

Open
ykhazbak opened this issue Dec 19, 2023 · 0 comments

Comments

@ykhazbak
Copy link
Member

ykhazbak commented Dec 19, 2023

Eternal Orchestration "SiteNetworkServiceStateBillingOrchestrator" started execution and then got stuck while processing a message after lease re-assignment.

The partition "ansmsitenetworkservicehub-control-06" was reassigned to worker node "_armBEaz_11" from worker node "_armBEaz_10", and just after the lease re-assignment, the worker node "armBEaz11" was never able to process one message of (TimerFired Event) and consistently abandoning the message for days.

The orchestration is stuck at line 114 of the code below, note that four task activities were already executed at this point:
image

Logs:
https://jarvis-int-west.microsoftgeneva.com/E06F8A5F
https://jarvis-int-west.microsoftgeneva.com/8D6D9236

Instance Id: 613e83a4-eb15-42c6-aa12-329f0e215894:SiteNetworkServiceStateBillingOrchestrator:V1
Event Type: TimerFired

image
image

Can someone help identify if this is a race condition? And how we can solve this? This is a billing orchestration which runs periodically, and it is very important to ensure it runs smoothly and consistently emitting billing events.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant