Improve sqrt performance in lighting shaders by exploiting floating point binary representation #92129

rainydaysavings · 2024-05-19T19:02:27Z

These changes aim to improve sqrt computation performance, the first commit is based upon a presentation of Michal Drobot that can be found here.
The new function, named sqrt_IEEE_int_approximation, is quite similar to the infamous FISR algorithm. Albeit, for sqrt itself.

The second commit was intended to replace the use of the built-in length function for V_GGX and V_GGX_anisotropic, and use the aforementioned approximation. These, in turn, are based upon "Course Notes: Moving Frostbite to PBR", found here.

I've made several runs (n=5) comparing performance before and after these changes. Using godot-benchmarks. These results are based upon the Rendering/Lights and Meshes benchmarks:

I believe this new sqrt approximation could be taken advantage of in other shader includes, but have had a hard time understanding where best to place it. As of now, it resides under /servers/rendering/renderer_rd/shaders/scene_forward_lights_inc.glsl. Given this, I would appreciate any comments regarding a more general approach.

Outlined in Low Level Optimizations for GCN by Michal Drobot in 2014

The improved functions are based upon the work found in "Course Notes: Moving Frostbite to PBR", page 12. Found here: https://media.contentapi.ea.com/content/dam/eacom/frostbite/files/course-notes-moving-frostbite-to-pbr-v32.pdf

AThousandShips · 2024-05-20T07:45:11Z

I've made several runs (n=5) comparing performance before and after these changes.

What is n here? And what is the "test number"?

I don't know that this data is conclusive, also are the results for test 9 identical? Or is the optimized result covering the original? Please also add the raw data so the results can be analysed more closely, a bar graph is pretty limited

rainydaysavings · 2024-05-20T09:21:39Z

What is n here? And what is the "test number"?

There are 9 different benchmarks under the Light and Meshes tests, I have made 5 different runs over those 9 for the original version and 5 other runs over the same 9 benchmarks for mine. The results were then averaged out.

The results for benchmark number 9 were the same up to 2 decimal points.

Please also add the raw data so the results can be analysed more closely, a bar graph is pretty limited

I'll run both versions later today as done here and with a profiler, and will share the raw data here. I understand the bar graph isn't that great.

AThousandShips · 2024-05-20T09:37:46Z

So reading the document this comes from, they use "RME", what does that refer to? It's not as far as I can find an established term to describe errors (is it maybe RMSE, root square mean error?)

Is the domain safe here and is it guaranteed to not produce degenerate or invalid results within this use case?

rainydaysavings · 2024-05-20T10:23:11Z

So reading the document this comes from, they use "RME", what does that refer to? It's not as far as I can find an established term to describe errors (is it maybe RMSE, root square mean error?)

I believe it stands for Mean Relative Error. More can be seen here.

Is the domain safe here and is it guaranteed to not produce degenerate or invalid results within this use case?

If there's a difference, it is imperceptible. From my testing. I'll also try to provide comparisons.
Regarding degenerate or invalid results, it'll always stay close to the real sqrt.

The biggest concern is error accumulation, so one must use it selectively.

P.S. I've realised I've made an error in a previous reply, there are 13 benchmarks, not 9.

AThousandShips · 2024-05-20T10:34:17Z

As long as the error is small enough and reliable enough it should be safe, but the square root is extra sensitive I suspect as the IEEE standard requires it to be exact, which I assume GPUs are expected to follow, so as long as the algorithms used for lighting are lenient enough this shouldn't be a problem, assuming it's correct for the domain used here, but getting some data on that the method is considered reliable would be good

clayjohn

Using sqrt_IEEE_int_approximation carefully seems like a nice improvement, but I am unsure about the changes to the GGX functions. We carefully selected those variants of the GGX approximation because there were so cheap. Your new versions appear to produce way more instructions (I assume they are more accurate as a result). (see for some history #51716)

The Filament docs have a good explanation for why we use the optimized V_GGX instead of the more correct term. https://google.github.io/filament/Filament.md.html#materialsystem/specularbrdf/geometricshadowing(specularg)

clayjohn · 2024-05-20T20:37:41Z

servers/rendering/renderer_rd/shaders/scene_forward_lights_inc.glsl

+	float Lambda_GGXV = NdotL * sqrt_IEEE_int_approximation((-NdotV * alpha + NdotV) * NdotV + alpha);
+	float Lambda_GGXL = NdotV * sqrt_IEEE_int_approximation((-NdotL * alpha + NdotL) * NdotL + alpha);
+
+	return 0.5 / (Lambda_GGXV + Lambda_GGXL);


This looks like it uses way more instructions than the old version, are you sure that this is faster?

Hey, sorry to not have shared benchmarks in the meantime, hope I can today.

As for the V_GGX implementation, you're correct, it wouldn't make much sense if it were. To be honest, I started out by trying to use the version you see and only then I tried to approximate the sqrt. Even though this is backwards in their commit order.

This V_GGX implementation might be hurting the sqrt approximation gains, if anything. The original V_GGX implementation seems to have been pretty well thought out, and more performant.

As promised I'll provide further testing today.

I've disassembled the shader on my end to check on this, resulting in the following:
V_SmithGGXCorrelated_spirv_comp.txt

I think the focus of the PR should be on the main change, the rewrite of the algorithm, or split it into two separate ones if the sqrt changes can be independent

Thanks for taking a look, indeed we end up with a lot more instructions.

I suspect that in the short term we will split this file in two and have a separate implementation for mobile and desktop. These variants that you have added are probably more appropriate for desktop as we want maximum quality on desktop (a couple instructions won't cost much). But on mobile we absolutely need to reduce the instruction count as much as possible

rainydaysavings added 2 commits May 19, 2024 20:10

Exploiting floating point for faster sqrt

2b8aed5

Outlined in Low Level Optimizations for GCN by Michal Drobot in 2014

Making use of sqrt approx. for V_GGX terms

b187b88

The improved functions are based upon the work found in "Course Notes: Moving Frostbite to PBR", page 12. Found here: https://media.contentapi.ea.com/content/dam/eacom/frostbite/files/course-notes-moving-frostbite-to-pbr-v32.pdf

rainydaysavings requested a review from a team as a code owner May 19, 2024 19:02

Applied clang-format to modifications

f44d1d7

Calinou added enhancement topic:rendering performance topic:3d labels May 19, 2024

Calinou added this to the 4.x milestone May 19, 2024

Fixed V_GGX comment spelling mistake found by codespell

bdb8b23

clayjohn reviewed May 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve sqrt performance in lighting shaders by exploiting floating point binary representation #92129

Improve sqrt performance in lighting shaders by exploiting floating point binary representation #92129

rainydaysavings commented May 19, 2024 •

edited

AThousandShips commented May 20, 2024

rainydaysavings commented May 20, 2024

AThousandShips commented May 20, 2024

rainydaysavings commented May 20, 2024

AThousandShips commented May 20, 2024

clayjohn left a comment

clayjohn May 20, 2024

rainydaysavings May 21, 2024

AThousandShips May 21, 2024

clayjohn May 21, 2024

Improve sqrt performance in lighting shaders by exploiting floating point binary representation #92129

Are you sure you want to change the base?

Improve sqrt performance in lighting shaders by exploiting floating point binary representation #92129

Conversation

rainydaysavings commented May 19, 2024 • edited

AThousandShips commented May 20, 2024

rainydaysavings commented May 20, 2024

AThousandShips commented May 20, 2024

rainydaysavings commented May 20, 2024

AThousandShips commented May 20, 2024

clayjohn left a comment

Choose a reason for hiding this comment

clayjohn May 20, 2024

Choose a reason for hiding this comment

rainydaysavings May 21, 2024

Choose a reason for hiding this comment

AThousandShips May 21, 2024

Choose a reason for hiding this comment

clayjohn May 21, 2024

Choose a reason for hiding this comment

rainydaysavings commented May 19, 2024 •

edited