The fastest way of rendering a huuuge model ?

category: general [glöplog]
A friend of mine just bumped into this problem and i realized i have never had such issues, as the model always fit into the card memory.
So, there's a large model, say over 3 millions of vertices, and we need to pass normal, tangent, binormal, texcoords, color etc etc so the vertex chunks are pretty big themselves.
What would be the fastest way of rendering it then, as it doesnt fit into VRAM ?
One point is that either tangent or binormal could be restored in vertex shader has it two other ( normal, binormal or normal, tangent respectively ) values. Tho i really doubt that'd save any time comparing to saved memory-transfer time, atleast not with current gpu performance.
I also had this idea of splitting model into two subsets ( submodels actually, index spaces should not overlap for method to work efficiently) the way that one part fulfills VRAM ( leaving some space for rendertargets, textures etc ), so that we could store it at the card all the time. Another part, kind of leftovers, could just be drawn from RAM using DrawIndexedPrimitiveUP in terms of d3d. So we have to transfer less data from RAM, which could save some time.
And there are some obvious solutions like rendering the whole model from RAM or rewriting on-card vertex and index buffers multiple times a frame with a new data and thus rendering model by kindof 'subsets' imitation.
Also i'm curious if gfx card needs some vram free to do ...PrimitiveUp renders, and how big is the buffer it uses, if so.
Aight, any ideas ? :)
added on the 2006-02-18 12:29:52 by apricot apricot
added on the 2006-02-18 13:59:03 by skrebbel skrebbel
and switch to pokemon mini
added on the 2006-02-18 14:01:12 by Shifter Shifter
skrebbel : no, he can't do it really. it's precision what is appreciated in the project, exact model must be rendered. that's the point.
added on the 2006-02-18 14:07:08 by apricot apricot
you go and buy a bigger card obviously
added on the 2006-02-18 14:14:04 by winden winden
On the other hand, considering the fact that at standart resolutions there're too many triangles laying a claim to a pixel, yes, a slightly downsampled model could do, tho one would have to build such a model ( or atleast recalculate normals ) every frame to ensure precise shading.
But still, having one triangle per pixel, it's over 1.3 million triangles visible, so the memory problem is not solved.
BSP trees and such do not really do the trick in this very case, as the model is not some race-game-city-model where only 1/100th of it is visible at the moment, at least 50-60% of the model are visible every frame, and that's only in the case of transparency disabled.
added on the 2006-02-18 14:21:04 by apricot apricot
First ditch the tangent or binormal and calculate in the vertex shader.

Then I'd try is using smaller vertex formats. So, instead of, say, having normals/tangents in floats, you have encode them in D3DCOLORS (1 byte per component, 1 unused) or 10:10:10:2 format. For position/normal/tangent/UV/color, this gets vertex size from 48 bytes to 32 bytes, which is not miracles, but now a 3 million size vertex buffer takes 96MB instead of 144MB.

Then you split the model into chunks, put into managed buffers, render separately and hey, you have a nice stress test of D3D resource manager!
added on the 2006-02-18 14:25:24 by NeARAZ NeARAZ
yeah, that's a nice tip indeed, thanks NeARAZ.
The only thing i didn't get is how can you handle the 10:10:10:2 format in a shader?

Anyway, it's aimed for compatibility too, so what i meant was how to do it best on a 64mb card ? :D
added on the 2006-02-18 14:46:51 by apricot apricot
I guess asking "what the hell is it?" would be pointless, because the answer would either be "you'll see" or "it's a secret". Right? :)
added on the 2006-02-18 14:47:17 by Preacher Preacher
well it's not related to a demo, i dont know exactly, but i suppose it's a laser 3d-scanner output reconstruction project or something. not a commercial one, too. pure science. i will ask on opportunity.
added on the 2006-02-18 14:52:54 by apricot apricot
Rendering a 3 million vertex mesh and using normal maps is sick indeed.
added on the 2006-02-18 14:53:09 by imbusy imbusy
academists have to wait for newer graphics and computer technologies sadly.
can't you, like, split the data into a grid, and sortof mipmapdownsample them a few times into system mem, and show the highdetail cubes of the grid only when the camera is close? i'm sure there should be difficult words made up by beardy professors for stuff like this, but it doesn't sound so hard. naturally this only increases size, but not on vidmem.
added on the 2006-02-18 15:12:40 by skrebbel skrebbel
or LOD?
skrebbel: an octree? :)
added on the 2006-02-18 18:17:14 by smash smash
skrebbels is confused, he probably meant a c\-/tree
added on the 2006-02-18 18:20:13 by keops keops
Software rendering.
added on the 2006-02-18 18:30:36 by texel texel
Or... maybe precalc the lightning and color per triangle? If you has just vertice, triangle and color data, maybe it fits in vram
added on the 2006-02-18 18:33:29 by texel texel
just search nv for var... couple of var buffers, transfer to one while rendering the other.. gets you peak throughput rendering from sysmem.
in other news, you are doing something fundamentally wrong by not using lod.
added on the 2006-02-18 19:18:02 by shiva shiva
Maybe this one day, i.e. look at the boeing CAD model stuff ;]
added on the 2006-02-18 20:01:03 by bdk bdk
...and in my opinion this publication is quite interesting.
added on the 2006-02-18 20:08:33 by bdk bdk
what about displacement mapping? maybe that could help to reduce vertices a bit...
added on the 2006-02-18 20:21:17 by ttl ttl
Does anyone know
how to filter 64bits cubemap
with ati card ??
added on the 2006-02-18 21:01:56 by magic magic
fadeout: 10:10:10:2 format is D3DDECLTYPE_UDEC3 or D3DDECLTYPE_DEC3N (of course, not every hardware supports that... if not, you just fallback to 8:8:8:8 aka D3DCOLOR and do a foo*2-1 in the shader)
added on the 2006-02-18 21:22:39 by NeARAZ NeARAZ
if there would be a fast way to render all that on consumer hardware, carmack would have done it already?
added on the 2006-02-18 22:01:33 by the_Ye-Ti the_Ye-Ti