Optimizing Closure :: pouët.net

Optimizing Closure

category: code [glöplog]

added on the 2012-09-18 20:43:50 by Tomoya

added on the 2012-09-18 20:45:04 by ___

added on the 2012-09-18 20:51:59 by Tomoya

OK now here's maths for noobs. Yes hArDy, I'm looking at you!
Let's talk about FPS. That's the amout of Frames that are rendered in one second (1s).
Let's talk about ms. That's 1/1000 of one second (1s).
Let's talk about realtime. That's an *expected* framerate of 60 FPS.

Now here comes the math. If I want 60 frames in one second, how long can one frame be at most? *scratches on paper* That's 1/60s = 0,01667s = 16,67ms. Exactly. Your whole rendering pipeline must render everything in less than 16,67ms to stay fluent.

Now let's talk about performance measurements. I trust smash's experience which shows in his awesome intros, to make an accurate measurement. If your PostProcess FX chain takes fucking 4,5ms, then you're forefitting a large part of computing power.

How much? Let's talk about maths. If 16,67ms is 100% then 4,5ms is 26,99%. Thats a fucking quarter of your entire frame rendering time that's available to you!

Now let's talk about for free... ignorance surely is bliss. Now hArDy, go read ryg's block and then come back with REAL arguments, if you're going to discuss with people who actually do have a clue.

added on the 2012-09-18 21:10:13 by xTr1m

Ah and kb...

Quote:

fucking 16 milliseconds.

I forgive you for not having turned off VSYNC :)

added on the 2012-09-18 21:13:16 by xTr1m

k, when people getting ponied already it's time for a weird proposal to the initial question.

The backbuffer is scaled down for post processing anyway, so you could encode the image into BC4 (or BC1 for colored glow) in that step. (Visit link below for a how-to)
Using that new compressed texture the texture cache can now store 8x as much texels as before, resulting in less cache misses and thereby less stalling by waiting for texture reads. This approach may be most useful when you have alot and somewhat random samples (SSAO?) and assumes the cache uses a LRU strategy and the fragments are somewhat local to each other ofc. It's lossy compression though so handling artefacts will be a problem. Maybe later BCn formats work better.

In case someone wants to explore this further, here's a link to a presentation how to render to BCn with DirectX

http://twvideo01.ubm-us.net/o1/vault/gdc10/slides/Tranchida_TextureCompressionInRealtime.pdf

added on the 2012-09-18 21:21:25 by Inertia

Optimizing Closure

login