pouët.net

Instancing in OpenGL

category: code [glöplog]
http://iki.fi/sol/instancing.html

Instancing on opengl, various methods. Go read.
added on the 2010-11-22 15:55:29 by sol_hsa sol_hsa
Went and read. Useful! Thanks sol.
added on the 2010-11-22 17:00:27 by psonice psonice
Server down for me?
added on the 2010-11-22 17:34:31 by revival revival
idem
me too now. Guess it's that popular :)
added on the 2010-11-22 17:49:53 by psonice psonice
connection timed out, but managed to find it here - http://sol.gfxile.net/instancing.html
added on the 2010-11-22 19:03:23 by snoutmate snoutmate
iki seems to be slow for some reason; sol.gfxile.net is the current host, iki is the "permanent" url.
added on the 2010-11-22 19:53:34 by sol_hsa sol_hsa
nice article. would be nice if you could complete the timings for the ATI card too.i'm still wondering too why rendering without shaders is faster than with shaders. anybody can think of a reason? - i always thought that today's gpus simulate the fixed pipeline by regular shaders. am i wrong?
added on the 2010-11-22 20:41:55 by iq iq
The fixed function pipeline has lower precision requirements than the programmable pipeline, and this might explain some of the difference.
added on the 2010-11-22 20:50:10 by kusma kusma
Also, with shaders, the work the driver has to do to set up the vertex attributes could get a bit more complex (because you can do stuff like binding attributes to non-contiguous locations).

The driver also gets a non-fixed uniform-map, meaning that it gets trickier to avoid having to re-upload uniforms.

There's a lot of these things, and the best way of getting to know them is to write an OpenGL driver ;)
added on the 2010-11-22 20:56:39 by kusma kusma
Quote:
The fixed function pipeline has lower precision requirements than the programmable pipeline, and this might explain some of the difference.


Maybe precision hints would close the gap? (Assuming they're supported, I'm sure I read somewhere that they are in regular opengl but it's given me errors in the past so who knows?)
added on the 2010-11-22 22:31:05 by psonice psonice
The precision hits are AFAIR defined to be noop in GLSL, and are just present to improve compatibility with ESSL. So they should not improve performance for you, no.
added on the 2010-11-22 22:33:20 by kusma kusma
What about OpenGL 4.x?
added on the 2010-11-22 23:34:39 by las las
kusma: right, makes sense. I've been using it in ES, but then saw it was supported in GLSL and figured it'd make shader editing a lot easier.. but no, it just failed to compile the shader.
added on the 2010-11-23 00:40:33 by psonice psonice
nice post! just what i needed, in fact :-)
noob question: couldn't you do instancing in a geometry shader too?
added on the 2010-11-23 08:48:05 by skrebbel skrebbel
skrebbel, you mean generating geometry on the fly? - I consider that slow as fuck.
added on the 2010-11-23 09:01:56 by las las
Sure, the vertices would just go through the vertex shader, and then be transformed by a matrix texture in the geo shader with a high amplification factor like 64. It would spend more shader ressources per vertex as they need to be transformed per triangle, but for small that probably wouldn't matter much. And it would kill performance on geforce 8/9, while other dx10+ should be just fine, as the amplification is constant.

And yes, it would be nice if the table could be filled.
added on the 2010-11-23 09:10:24 by Psycho Psycho
FWIW, momentum used matrices in uniforms (ok, more like 1 quaternion+1 translation vector), which was easily fast enough for the 80k cubes we did per frame (max) in 2007.

also, don't underestimate the power of brute force cpu vertex processing on modern cpus :) just saying.
added on the 2010-11-23 10:29:28 by ryg ryg
ryg: How big instance batches did you use?
added on the 2010-11-23 10:45:34 by sol_hsa sol_hsa
dynamically sized to use available vertex shader constant storage. for 256 constants (typical value for ps3.0 HW) it was 112 instances per batch IIRC (that's at 2 vs constants per instance).
added on the 2010-11-23 11:43:03 by ryg ryg
ryg: Makes sense.
added on the 2010-11-23 11:49:28 by sol_hsa sol_hsa
did you try a shader attribute float array, (pointer to that) and put a display list around all actual rendering calls? That is, render with glBegin() glEnd() (the slow way) and then manipulate only the memory pointed to by the attribute, then there's minimal calls to the driver.
added on the 2010-11-23 16:05:15 by jaw jaw
Nice tutorial. I didn't know about ARB_instanced_arrays.

I think that most of programs which using Instancing read the per-instance matrix from the buffer in vertex shader.
But if you can generate/modify vertex attributes by only use of gl_InstanceID, you don't need to create large per-instaince data.

e.g.
vec4 pos = gl_Vertex + vec4(gl_InstanceID*scale, sin(gl_InstanceID), 0, 0);
gl_Position = mvp*pos;

This method will increase per vertex calculations but reduce fetches in vertex shader.
And I think it is easier than creating and managing a per-instance data.

I used such method in this program.
(Sorry for Readme.txt is only Japanese)
http://2chparty.net/wiki/index.php?plugin=attach&pcmd=open&file=mirrorHouse.zip&refer=%A5%DF%A5%E9%A1%BC%A5%CF%A5%A6%A5%B9

You can watch some screen shots of this program in this page.
(Sorry for this site is only Japanese again)
http://2chparty.net/wiki/index.php?%A5%DF%A5%E9%A1%BC%A5%CF%A5%A6%A5%B9

This program simulates mirror room(front, back, left and right walls are mirror)
Nothing move automatically like 8k intro.
You can move camera with keyboard and mouse.(w,s,a,d,e,c keys)
You can move bodies with keyboard. (6,7, t,y,u,i, h,j, g,k, v,m, b,n, space keys)
Press F2 key to unrestrict camera position.
It require OpenGL ver3.2.
added on the 2010-11-23 16:22:44 by tomohiro tomohiro
jaw: vertex arrays are copied when a display list is compiled, so that won't work.
added on the 2010-11-23 18:36:05 by kusma kusma
that said, how probable is it that a demo has 100k+ cubes but each cube's matrix *must* come from software? i mean, won't each matrix somehow be computed from other data (some spline, some formula, whatever)? and why not do that computation in the shader and just upload whatever data is needed for producing the matrices? (typically less data, i guess)

not sure here, just wondering about the application areas of some of these techniques.
added on the 2010-11-24 02:49:05 by skrebbel skrebbel

login