Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enough calls to lightmap_unwrap will eventually cause the game to hang. #92119

Open
DataPlusProgram opened this issue May 19, 2024 · 4 comments

Comments

@DataPlusProgram
Copy link

Tested versions

4.0, 4.2, 4.3dev

System information

Tested on a laptop with integrated graphics and desktop with dedicated.

Issue description

If lightmap_unwrap is called enough times the application will hang.

The number of calls before the hang can vary even on the same machine, my PC has gotten close to 10,000 calls before a hang while my laptop usually gets to around 2000.

Steps to reproduce

This is the code that will lead to a hang:
image

Minimal reproduction project (MRP)

unwrapTest.zip

@AThousandShips
Copy link
Member

Does this happen if you do the calls in multiple frames? By, for example, doing await get_tree().process_frame every few calls?

@DataPlusProgram
Copy link
Author

Trying this still results in the same:
image

@lyuma
Copy link
Contributor

lyuma commented May 19, 2024

It's a deadlock with main thread calling join() on a worker thread, and the worker thread waiting on the condition variable.
Main Thread:
godot_xatlas_deadlock_pt1
All 4 worker threads:
godot_xatlas_deadlock_pt2

It seems that the main thread is invoking .notify_one() without holding the lock, which can lead to deadlock if the notify call occurs while the thread is not at the wait() call.

The textbook usage of Condition Variable would require the joining thread to own the lock before invoking notify_one()

@lyuma
Copy link
Contributor

lyuma commented May 19, 2024

The crazy thing is if true, you've just discovered a deadlock in an extremely widely used library:
https://github.com/jpcy/xatlas/blob/master/source/xatlas/xatlas.cpp#L3157

One guess is other applications don't teardown the whole TaskScheduler for each bake attempt, and so it is much probably less likely to trigger this deadlock in practice in other applications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants