pouët.net

Very complex shaders and graphic drivers

category: code [glöplog]
 
I have noticed that on several video cards, when shaders that take too much time to compute (eg : 5 sec) drivers usually cancel the operation and reset the display (and then the intro / demo just crash).

Some other machines just freeze or become totally unresponsive (eg : loading heavy glslheroku scripts on mac os + ATI).

Is there any tricks to avoid that ? This is particulary useful for procedural gfx, where a single frame can take many seconds to compute.
added on the 2013-09-10 11:29:59 by Tigrou Tigrou
For procedual graphics, you can avoid these by not rendering a single fullscreen quad. Instead, you just render smaller tiles to fill the screen.

Just for testing purposes, these shader timeout values can be tuned by setting a registry value. But you can ouf course not assume those registry keys to be set on the compo machine. :)
added on the 2013-09-10 11:33:42 by urs urs
Tigrou: the reset is to protect the user - what happens if a shader takes infinite time? You never get back to the desktop :) Actually it used to be a big problem on mac that it *didn't* do that reset, I'd regularly cock up a for() loop and end up with infinite iterations, resulting in a hard reset. Not sure what to do to avoid it though.

The unresponsive thing on OSX: use chrome for stability, safari for speed. I've no idea why, but safari will become unresponsive on some shaders, often crashing some minutes later if you force-close the tab, while chrome will work fine. But safari is ~50% faster at executing the same shader. Strange.
added on the 2013-09-10 11:38:58 by psonice psonice
So maximum allowed time is not for a full frame (between two distinct swapbuffers() calls) but per triangle ? I guess this is a trade off : rendering too much small quads would just be slower than rendering a single one ?
added on the 2013-09-10 11:41:56 by Tigrou Tigrou
i think when you submit the triangle to the GPU and it starts drawing, it effectively stops responding until it's done. If it doesn't respond for say 5 seconds the OS assumes something went badly wrong and resets the GPU. So it doesn't matter if lots of small triangles takes longer, the GPU is responsive.
added on the 2013-09-10 12:09:27 by psonice psonice
Ok. Anyway when a shader is taking very long time, why can't the GPU just yield and do other vital tasks (like drawing desktop and such). For eg on any modern OS if a process is taking 100% of CPU, system is still responsive and process doesn't need to be "reset" (terminated).
added on the 2013-09-10 12:13:16 by Tigrou Tigrou
It could be certainly nice if GPU cores had pre-emptive logic, that would avoid any issues like this.
added on the 2013-09-10 12:23:54 by nystep nystep
windows WDDM drivers do now have pre-emption built in to the drivers, but this is controlled by windows to keep the desktop responsive, but so far as I remember this happens on a per-work packet basis so your fullscreen quad would most likely still end up being a single work packet. I think microsoft are trying to reduce the granularity down as much as possible in this area, but I forget how granular it really is.

in windows you can change the TDR timeout value with a registry key, but that's not really something you should be doing because you are removing a safety net. You can find those at http://msdn.microsoft.com/en-us/library/windows/hardware/ff569918(v=vs.85).aspx

I'm not sure how most GPUs handle the situation internally though - powerVR can self-detect and recover (hence that 'lockup' value in the xcode gpu stats that hopefully always stays at 0), so even without TDR running it should still be okay, but I don't know how the big boys handle things.

My vote would be for sending a large batch of triangles to avoid the situation - the overhead can't be that bad compared to the raw rendering work required...
added on the 2013-09-10 12:36:01 by sack sack
Why can't the shader compiler just use static analysis on the shader to determine how long it will take to run? (</troll>).
added on the 2013-09-10 12:36:12 by puppeh puppeh
Quote:
But you can ouf course not assume those registry keys to be set on the compo machine. :)


There is API for that :D but it will prolly result in this dialog box "strobo requires admin privileges to run"

Quote:
It could be certainly nice if GPU cores had pre-emptive logic, that would avoid any issues like this.


hell no. imagine you are a game developer and shaders were preemptive. no way to predict the performance of a frame? hell no. adding hints like preemtiveness could fix it, but still, default has to be non preemptive.
added on the 2013-09-10 12:39:41 by skomp skomp
Personally, i haven't been able to run most recent procedural gfx because windows reset card driver after a few seconds (on a GTX560) (waiting a little bit more to get a picture would not have been a problem)
added on the 2013-09-10 13:30:56 by Tigrou Tigrou
Is the timeout value applied per-pixel, per-triangle or per-drawcall?
added on the 2013-09-10 13:43:04 by Gargaj Gargaj
Quote:
So maximum allowed time is not for a full frame (between two distinct swapbuffers() calls) but per triangle?


It actually counts for each draw call. So if you render a vertex array full of small tiles, you won't actually win anything. For immediate mode, it works though.
added on the 2013-09-10 13:43:15 by urs urs
@urs : rendering small tiles in immediate mode doesn't help (drivers "resets"). only thing i which works is rendering very small tiles + swapbuffers() between each call (without clearing buffer off course).but then user can see actual progression which is not good. other suggestions ?
added on the 2013-09-23 19:25:10 by Tigrou Tigrou
render it to a rendertarget ?
added on the 2013-09-23 19:37:31 by pantaloon pantaloon
@pantaloon : Neat idea. Is there any code to start with ? i have found iq framework to be very useful. I had a look at pouet procedural graphics prods but its hard to tell which ones have source.
added on the 2013-09-23 19:44:11 by Tigrou Tigrou
tigrou: older nvidia drivers had that problem (that it wasn't enough to split drawcalls, but you had to sync (don't remember if glflush was enough)), but I think that has been fixed for more than a year now.
It obviously SHOULD be enough to ensure that your draw calls doesn't take too long.
added on the 2013-09-23 20:13:55 by Psycho Psycho
Quote:
hell no. imagine you are a game developer and shaders were preemptive. no way to predict the performance of a frame? hell no. adding hints like preemtiveness could fix it, but still, default has to be non preemptive.

As far as I can tell, they are adding mid-shader preemption in Windows 8, of couse if HW supports it. But with more and more GPU compute used in apps, this was inevitable.
added on the 2013-09-24 11:42:02 by KK KK
And yes - there always was mid-frame context switching between draw calls (sometimes requiring massive pipeline and cache flushes), so this doesn't change overall picture of performance prediction.
added on the 2013-09-24 11:43:58 by KK KK

login