pouët.net

Shader compilation is too slow

category: code [glöplog]
 
I have a couple of scenes the glsl code for which looks something like

Code: float fScene(in vec3 p) { if ( ... ) { return scene1(); } else if (...) { return scene2() } ... else { return scene10(); } }

The scene functions evaluate some kind of signed distance field.
Somewhere else in the code, I have something like:

Code: for (int i = 0; i < 6; ++i) { // evaluate fScene at a few points // output triangle mesh }


Compilation time seems to grow super fast with each scene I add. Do you think this is normal? What would be your advice. Should I separate those if and elses into separate shaders?
added on the 2016-05-27 10:10:28 by varko varko
Assuming you don't care about code size, what you want is probably GL_ARB_shader_subroutine.
added on the 2016-05-27 10:18:23 by Sesse Sesse
At least on Nvidia, GLSL shader compilations seems to grow linearly with the size of the AST. It's been a while since I looked at that, but I can remember some takeaways:
1. If your compilation time is dominated by a big function f(), every literal call of f() counts, but calling it once inside a loop - even though the loop might be unrolled later during compilation - counts only once. The costly compiler operation(s) seem to happen at a stage in which all function calls are already inlined in the AST, but loops are still kept as actual jumps.
2. Pruning has not happened yet. Maybe pruning even IS the costly compiler operation. So in case a call to f() will never be reached (and will be pruned successfully later in compilation due to static analysis), it will be present in the AST and it will count towards compilation time. So even if you have something like if(false){...}, the thing inside the if will count againt compilation time, even though the code will be removed later. Also of course the compiler can't know If you have a uniform which enables some debug view that you never use and the debug view evaluates your function in a few places, so it will cost you regardless.

The most common case: raymarchers will compile much faster if you compute your gradient in a loop instead of using six individual calls.

When you say "grows super fast", do you mean linearly with the number of scenes you add? From my experience, the only thing you can do is literally string-searching in your editor for calls to the big function(s) and trying to reduce that as much as possible. If that is not possible, splitting the shader into smaller parts will probably mean that the smaller shaders have at least the same combined compilation time as the big one. Can still be better for development if you only work on one of those at a time. We have a macro that we sometimes use during development which disables all materials, shading, etc and leaves only two calls of the distance function: One in the main tracing loop and one in the gradient loop. Makes compilation time go from 2 seconds to 0.2 seconds or so. (Before the general optimization I mentioned, like the gradient loop, it would go up to 10 seconds sometimes, which would imply intro loading times > 90 seconds). The loading bar in mercury 64ks is still usually mostly shader compilation though.

At least in older drivers when using 32bit, you could also overflow something inside the compiler when the AST got too big, with really weird side-effects on the whole driver: In our case, all subsequent texture reads (after hitting that ceiling once) were zero except for the green channel which was still okay. Don't ask, I don't know and I don't want to know. Gave us some trouble when it started happening a week before a release... I think currently it gives some internal error message if you hit the ceiling, but that hasn't happened to us since we keep AST size in mind during coding.
added on the 2016-05-27 12:46:48 by cupe cupe
Thanks for the replies. Learned something new :)
The problem seems to be coming from my getNormal() function which numerically computes the normal of the sdf at a given point.

In the body of my inner loop I compute the normal for each vertex of the triangle. I guess the function gets inlined everywhere making the code to explode. Probably some loop unrolling happens outside as well.

I changed it so that I compute it only for the center of that triangle and the compilation takes almost no time.

By "grows super fast", I meant something more than linear or at least it had some bigger constant factor. It basically went from "not a problem", to 1s to >1m after a few added scenes.

Cupe, btw I am using Mercury's hg_sdf library. Thanks!
I actually also used it for some procedural mesh generation a few months back for a school project:

BB Image

BB Image

Not rendered in real time
added on the 2016-05-27 20:58:37 by varko varko
Dat party location, just sayin ;)
added on the 2016-05-28 03:44:56 by LJ LJ

login