🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Shader Profiling and DirectXShaderCompiler

Started by
2 comments, last by Zinadore 3 years, 1 month ago

Hello everyone,

I am trying to figure out why my pixels shaders are being slow. I mean I know why, I want to find specific instructions that could potentially be changed and optimized. So some information about the setup:

  • API is D3D12
  • Shader Model is SM6
  • Everything is compiled through DXC, not FXC

NVIDIA Nsight Graphics is giving stall information like below (in case it is not readable the text is: %158 = phi i32 [ 0, %153 ], [ %342, %338 ] which apparently is some LLVM specific syntax and it is causing “Wait” stalls on the pixel shader, dropping the SM occupancy by a lot. That line is never hit by a breakpoint in PIX so it's not part of “my code”. Does anyone have any clue what is going on here or what that line means? I am honestly completely lost. I know this is very specific and weird, but I'm at my rope's end so I figured I would ask here.

Do you have any other suggestions on how to approach profiling this thing? Keep in mind it is my own implementation not in some kind of engine.

I apologize if this is not the correct forum to ask, if there is a better place for this I would appreciate if you pointed me to it.

Advertisement

LLVM IR (intermediate representation) is in single static assignment form. What that means, is each variable in the code gets assigned exactly one time. You can kinda think of it as if each line of code has a dedicated variable for it, that stores that lines result. Now on one hand this makes much easier to optimize code, but it has problems handling branches. For example, you have an if+else pair in your code, both writes the same variable X, that gets referenced later also. When this is compiled into SSA form, X as a variable ceases to exist, and in its place multiple different variables will store it's value during it's lifetime. Now with the if+else pair, the problem is, that a version of X will exist in both branches, so we have to converge those two into a single variable after the blocks, to be able to use the correct result in later calculations. This is what phi does, it selects which value gets to live after the branches, based on the which path was taken through the branches.

Now I have no idea how Nsight works, but the fact that it screams about a phi instruction, would make me think that the problem is with the previous branches, like they are too divergent within a wave/warp, or something like that.

Thanks for the explanation! That makes a lot more sense. So if I am understanding this correctly the above phi instruction is semantically similar to: “%158 is 0 if block %153 executed, or %342 if block %338 executed”. It makes sense that if either of those blocks takes time to execute, that instruction would stall and wait.

But that still doesn't explain why I cannot step to that instruction during debugging. I would hope since the instruction is there, it would be reachable at some point.

Are there any suggestions on profiling tools beyond Nsight? It's the only one ive found that lets me profile shaders themselves vs just the pipeline

This topic is closed to new replies.

Advertisement