-
-
Notifications
You must be signed in to change notification settings - Fork 18.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve sqrt performance in lighting shaders by exploiting floating point binary representation #92129
base: master
Are you sure you want to change the base?
Conversation
Outlined in Low Level Optimizations for GCN by Michal Drobot in 2014
The improved functions are based upon the work found in "Course Notes: Moving Frostbite to PBR", page 12. Found here: https://media.contentapi.ea.com/content/dam/eacom/frostbite/files/course-notes-moving-frostbite-to-pbr-v32.pdf
What is I don't know that this data is conclusive, also are the results for test 9 identical? Or is the optimized result covering the original? Please also add the raw data so the results can be analysed more closely, a bar graph is pretty limited |
There are 9 different benchmarks under the Light and Meshes tests, I have made 5 different runs over those 9 for the original version and 5 other runs over the same 9 benchmarks for mine. The results were then averaged out. The results for benchmark number 9 were the same up to 2 decimal points.
I'll run both versions later today as done here and with a profiler, and will share the raw data here. I understand the bar graph isn't that great. |
So reading the document this comes from, they use "RME", what does that refer to? It's not as far as I can find an established term to describe errors (is it maybe RMSE, root square mean error?) Is the domain safe here and is it guaranteed to not produce degenerate or invalid results within this use case? |
I believe it stands for Mean Relative Error. More can be seen here.
If there's a difference, it is imperceptible. From my testing. I'll also try to provide comparisons. The biggest concern is error accumulation, so one must use it selectively. P.S. I've realised I've made an error in a previous reply, there are 13 benchmarks, not 9. |
As long as the error is small enough and reliable enough it should be safe, but the square root is extra sensitive I suspect as the IEEE standard requires it to be exact, which I assume GPUs are expected to follow, so as long as the algorithms used for lighting are lenient enough this shouldn't be a problem, assuming it's correct for the domain used here, but getting some data on that the method is considered reliable would be good |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using sqrt_IEEE_int_approximation
carefully seems like a nice improvement, but I am unsure about the changes to the GGX functions. We carefully selected those variants of the GGX approximation because there were so cheap. Your new versions appear to produce way more instructions (I assume they are more accurate as a result). (see for some history #51716)
The Filament docs have a good explanation for why we use the optimized V_GGX instead of the more correct term. https://google.github.io/filament/Filament.md.html#materialsystem/specularbrdf/geometricshadowing(specularg)
float Lambda_GGXV = NdotL * sqrt_IEEE_int_approximation((-NdotV * alpha + NdotV) * NdotV + alpha); | ||
float Lambda_GGXL = NdotV * sqrt_IEEE_int_approximation((-NdotL * alpha + NdotL) * NdotL + alpha); | ||
|
||
return 0.5 / (Lambda_GGXV + Lambda_GGXL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it uses way more instructions than the old version, are you sure that this is faster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, sorry to not have shared benchmarks in the meantime, hope I can today.
As for the V_GGX implementation, you're correct, it wouldn't make much sense if it were. To be honest, I started out by trying to use the version you see and only then I tried to approximate the sqrt. Even though this is backwards in their commit order.
This V_GGX implementation might be hurting the sqrt approximation gains, if anything. The original V_GGX implementation seems to have been pretty well thought out, and more performant.
As promised I'll provide further testing today.
I've disassembled the shader on my end to check on this, resulting in the following:
V_SmithGGXCorrelated_spirv_comp.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the focus of the PR should be on the main change, the rewrite of the algorithm, or split it into two separate ones if the sqrt changes can be independent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look, indeed we end up with a lot more instructions.
I suspect that in the short term we will split this file in two and have a separate implementation for mobile and desktop. These variants that you have added are probably more appropriate for desktop as we want maximum quality on desktop (a couple instructions won't cost much). But on mobile we absolutely need to reduce the instruction count as much as possible
These changes aim to improve sqrt computation performance, the first commit is based upon a presentation of Michal Drobot that can be found here.
The new function, named
sqrt_IEEE_int_approximation
, is quite similar to the infamous FISR algorithm. Albeit, for sqrt itself.The second commit was intended to replace the use of the built-in
length
function forV_GGX
andV_GGX_anisotropic
, and use the aforementioned approximation. These, in turn, are based upon "Course Notes: Moving Frostbite to PBR", found here.I've made several runs (n=5) comparing performance before and after these changes. Using godot-benchmarks. These results are based upon the
Rendering/Lights and Meshes
benchmarks:I believe this new sqrt approximation could be taken advantage of in other shader includes, but have had a hard time understanding where best to place it. As of now, it resides under
/servers/rendering/renderer_rd/shaders/scene_forward_lights_inc.glsl
. Given this, I would appreciate any comments regarding a more general approach.