Is there a name for this trick for clearing the backbuffer in software rendering?

category: code [glöplog]

I thought I'd share this, in case it's not actually a known trick.

So basically I was extremely bothered by having to clear my "screen" every frame. Even using a seperate core it's still a massive drag on performance, because it occupies the bus so much and also takes away time from doing something else.

Thinking about this, my brain spat out something someone said many, many, many years ago about his technique of clearing the backbuffer only every 256 frames. He didn't give any details and I didn't know if he actually managed to succeed, so I've looked into it myself.

In my example I'm using a 1920x1080x4 (8.294.400 bytes!) backbuffer, which is sent to the GPU every frame. I'm sure there's ways of improving this, but I don't want to talk about those at all, because I'd rather discover them myself. ^_^

Anyhow, what I came up with is actually incredibly straight forward. There's 4 bytes available per pixel, RGB and A. A, as long as I don't care about the Alpha Byte, can be used as a counter variable to indicate which frame needs to be drawn by the GPU.

When I draw pixels, every Alpha byte contains the number of the current Frame mod 256. I send the current Frame's number to the GPU and the shader checks, for every texel, if the Alpha byte equals the current number of the Frame. If it does, it draws the pixel. If not, it draws black.

Now I can either zero the backbuffer every 256 frames ...
... or roll through the backbuffer, clearing 32.400 bytes every frame!

That's 0,39064383891970098866650590788522 % of the full back buffer!

That's it! A MASSIVE improvement!
And it really works! \:D/

Sorry if this was widely known already, but I'm too excited. ^_^
Seems so obvious now, though. Kind of hilarious I didn't think of it much earlier.

Does anyone know about this?
Is software rendering dead?
Is there a name for this trick?

Is anyone out there making use of APUs ...
... and all their cores ...
... for zero-copy software rendering nowadays?

Let me know!

Thank you! :D
Mental Note: Next time add more paragraphs for improved readability.
If you put your counter in a separate A32 format buffer you'd only need to do that clear every 3 days or so, at 60fps.
added on the 2020-12-28 15:05:39 by alia alia
also, depending on the image format, if you're using RGBA8 but your bus is >=32-bit instead of 8-bit, wouldn't zeroing out the entire pixel be a better idea anyway?
added on the 2020-12-28 15:36:47 by porocyon porocyon
also, depending on the image format, if you're using RGBA8 but your bus is >=32-bit instead of 8-bit, wouldn't zeroing out the entire pixel be a better idea anyway?

I don't understand what you're asking.

I clear 1/256th of the backbuffer every frame and roll through it until it's fully cleared.
Then the process repeats. I don't clear single pixels, or just the Alpha byte.
Is there a name for this trick?

Double buffering, Page flipping

Double Buffering for Flicker Free Output

Is software rendering dead?

No. GLSL iz 4 sukkaz
So you save some time on the CPU by doing more work on the GPU. Usually a good trade-off.

Is software rendering dead?

Seems to be doing well, especially when there's a hardware renderer available to help it a bit on the way. ;)

Is there a name for this trick?

Not sure, but I have definitely encountered it before. Not just for graphics, but generally for arrays of booleans (or small-range integers).
added on the 2020-12-28 22:53:20 by Blueberry Blueberry
I'm a bit confused, is this software or hardware? Because this trick only makes sense in software (hardware has fast clears), but you mention the GPU. And yes, as Blueberry mentions, it's fairly common. I've used it a number of times in graph traversals; instead of having a visited/not visited bool that you need to clear for each run, you can keep a generation counter.

Also, what do you do about the Z-buffer? Do you even have one? (It's fairly easy to mess up early Z-reject with this kind of stuff.)

Another trick that was a thing before fast Z clears became a thing: Using half of the Z-buffer for every other frame (flipping Z sign on each frame).
added on the 2020-12-28 23:19:53 by Sesse Sesse
Took me time to think of the advantage (not sure how well this will work) but reminds me of an older trick on DOS mode 13h, 256 colors with palette. For things like dot effects to avoid having to individually clear the dots of the previous frame.

Render the dots with color 1. Palette makes it white.
Next frame, make color 1 palette to black (background), render dots with color index 2.
Repeat, till 256.
Meanwhile,. you start every frame deleting 1/256th of the framebuffer.

I am not sure how well it will work (you blindly delete to 0 scanlines, what if you leave some dots above that are currently hidden by the pallete, but when the palette switch loops from 255 to 0, suddenly things appear that weren't deleted. Not sure.. I need to try it.
added on the 2020-12-29 13:10:31 by Optimonk Optimonk
I would guess the trick mentioned by solstice_projekt will only give benefits in scenes where the background is actually visible. I think if you have a 3d scene where the geometry always completely fills the camera view (indoors, huge walls etc.) then you don‘t have to clear the backbuffer anyway (because you render every pixel anyway). Also when using a skybox.
(Also I am assuming you are using polygon sorting instead of zbuffering here.)

On a related note there is an old paper by Steve Worley iirc, that mentions a trick for marking background pixels with A=0xff and using 24-bit z-values in RGB in the framebuffer in a z-prepass. This way you don‘t need an additional zbuffer, you only need one RGBA framebuffer: In the second pass you write the actual background RGB values to all framebuffer pixels that have A=0xff, and for all other pixels you evaluate your rendering function at the given z. Or somthing like that. Can‘t remember exactly :)
added on the 2020-12-29 15:03:15 by spike spike
IIRC Quake also used a bit similar technique for marking visible nodes in the BSP tree. Every node has a field for frame number and markVisible() sets the current frame number in visible nodes. Thus clearing all node flags every frame is not necessary!
Not sure, but I have definitely encountered it before. Not just for graphics, but generally for arrays of booleans (or small-range integers).

@Blueberry thanks for letting me know! I've never come across it myself.

@Sesse Software, using OpenGL to upload the buffer. Thinking of switching to dx though.
The palette trick is a nice idea too!

@spike Yes, as soon as the full screen is being filled with pixels, clearing it is pointless, therefore this trick is pointless as well. :D Right now I'm just writing pixels and I'm actually looking forward to solving the z-buffer problem on the CPU efficiently, though I'm not yet convinced I'm going to implement a rasterizer. I have no idea, but I'm progressing! ^_^

@hot_multimedia thanks for letting me know!


I have no idea if tagging actually works.