pouët.net

DX9 / DX10 / DX11 - Smallest 1k / 4k intro framework?

category: code [glöplog]
I think most DirectX coders here are using DX9 for their 1k and 4k frameworks. Has someone here any experience and tried to create a 1k / 4k framework based on DX10 / DX11? The real question is: Which version of DirectX gives the smallest compiled and packed framework for intro coding? Thx
added on the 2009-10-26 17:05:17 by MrVainSCL MrVainSCL
dx9/ogl<dx10<dx11
More setup for dx10 and a little bit on top of that for dx11.
My fullscreen shader 1k "benchmark" is currently ~70 bytes larger in the dx11 computeshader version than the ogl or dx9 pixelshader versions. Not sure yet if the pixelshader version will be more or less than that.
If doing simple polygon stuff you will miss the fixed function calls..
added on the 2009-10-26 17:34:12 by Psycho Psycho
ask the guys who made this i guess: http://www.pouet.net/prod.php?which=51448
there is potentially a big space gain under opengl with this new extension:
EXT_separate_shader_objects, but i am not sure if you need an opengl 3.2 context to use it or if an older opengl 2.1 context is enough to enable it.

But you still have to cope with GLSL which might be a bit larger than HLSL anyway ( i think about pure floating point constants and so on...) .
added on the 2009-10-26 20:05:38 by nystep nystep
BB Image
added on the 2009-10-26 20:14:05 by trc_wm trc_wm
Nope, it would take more space using that extension. But if anything I think glsl is slightly smaller than hlsl (but too close to call).
added on the 2009-10-26 20:15:54 by Psycho Psycho
agreed with psycho... the extension doesn't help
added on the 2009-10-26 22:19:10 by auld auld
you guys are wrong, sorry
added on the 2009-10-27 00:06:04 by nystep nystep
Ok, had a closer look at the extension.. It's (as known) meant to allow setting vertex and pixel programs independently, as part of nvidias dx-compatibility movement. It is currently nvidia only (and so butt-ugly that I could imagine it being changed before going ARB...), however it's also so simple that I think I could emulate it for most intro use in gShaderReplacer if needed..
The unexpected (and ugly, nonorthogonal) thing is this method:
uint CreateShaderProgramEXT(enum type, const char *sting);
which replaces all the createshader, shadersource, compileshader, createprogram, linkprogram stuff.
This way a 1k-ish pixelshader only setup, without uniforms could do with just CreateShaderProgramEXT and UseShaderProgramEXT, sending it way below dx9. (and get a nice BB Image for being nv only ;)
added on the 2009-10-27 00:44:39 by Psycho Psycho
I believe that extension to be really useful but seriously would ARB really allow something that is very very non-generic to make its way into the standard? Nope. Sure it's nice but the lack of uniform access is quite a "funny behavior".
added on the 2009-10-27 02:26:19 by decipher decipher
… Okay never mind, I think I misread Psycho's post. It's fucking 3 AM. :/
added on the 2009-10-27 02:29:06 by decipher decipher
missed the redefinition of CreateShaderProgram. If it gets to be an ARB extension in this form, yes, it'll be useful.
added on the 2009-10-27 07:27:54 by auld auld
This is step forward, but why not have an extension that takes ONE source string with a vertex + fragment + possibly geometry shader, compiles and links it? You could retrieve the individual shaders afterwards if needed. Would be sweet for 1ks...
added on the 2009-10-27 11:35:27 by raer raer
Did anyone try to develop a DX10 in pure asm code? I'm just trying to generate *.inc files from includes (with the help from http://japheth.de/h2incX.html), but it's seems to be a mess in the end...
i tried as well to preprocess the dx10 include file with cl.exe, but it still a lots of work to make it parseable...

The only way i found so far is to use inline assembler in VC++ inside a .c source through Vtbl structure (ID3D10DeviceVtbl...etc.). It's working but i'm wondering if there are any other options?

added on the 2009-10-27 12:06:02 by xoofx xoofx
rare: because, luckily, these standards and extensions aren't modelled after 1K needs; it's not a very useful scenario outside of small rendering demos/tests
added on the 2009-10-27 12:13:36 by superplek superplek
@lx: Trust me, well written C code will at least be as good as what you can write in "pure Assembly". Ask Ferris.
added on the 2009-10-27 12:19:34 by decipher decipher
Decipher: from the test i'm running, it seems to be not completely true.

The uncompressed C code is indeed often as good as or even better than the asm version, but it sounds different with a context-modeling compression (as it is the case when we use crinkler).

From a simple protocol test, with a main function calling only another function that makes 5 calls to directx...
- the C version uncompressed is 85 bytes, but compress to 42.49 bytes -> ratio is 53%
- the ASM version uncompressed is 81 bytes, but compress to 33.25 bytes -> ratio is 39%

The fact is that you can produce a code that can be more crinkler-friendly in ASM than in C, even if the ASM code looks ugly or inefficient in size at a first look... Of course, the asm generated from a c code is quite consistent, and can compress well in a long run, but looking more deeply seems that you can gain some bytes here in asm... Mentor's code is a proof of concept in this domain ;) (check elevated)

added on the 2009-10-27 13:05:56 by xoofx xoofx
i gain only 32 bytes with this extension in intro machine, but i can't test it -_-. ati card still here... well, better than nothing. if someday it moves to core function names will be 9 (!) bytes shorter. :p

Code: // hi auld :) typedef GLuint (GLAPIENTRY * PFNGLCREATESHADERPROGRAMEXT) (GLenum type, const GLchar* shaderSource); typedef void (GLAPIENTRY * PFNGLUSESHADERPROGRAMEXT) (GLenum type, GLuint program); typedef void (GLAPIENTRY * PFNGLACTIVEPROGRAMEXT) (GLuint program); typedef void (*GenFP)(void); static GenFP glFP[3]; const static char* glnames[]={ "glCreateShaderProgramEXT", "glUseShaderProgramEXT", "glActiveProgramEXT" }; static __inline void setShaders() { for (int i=0; i<3; i++) glFP[i] = (GenFP)wglGetProcAddress(glnames[i]); // if i understood from the spec, it should be something like this.. GLuint v = ((PFNGLCREATESHADERPROGRAMEXT)(glFP[0]))(GL_VERTEX_SHADER, vsh); GLuint f = ((PFNGLCREATESHADERPROGRAMEXT)(glFP[0]))(GL_FRAGMENT_SHADER, fsh); ((PFNGLUSESHADERPROGRAMEXT)glFP[1])( GL_VERTEX_SHADER, v ); ((PFNGLUSESHADERPROGRAMEXT)glFP[1])( GL_FRAGMENT_SHADER, f ); ((PFNGLACTIVEPROGRAMEXT)glFP[2])( v ); ((PFNGLACTIVEPROGRAMEXT)glFP[2])( f ); }

added on the 2009-10-27 14:01:00 by nystep nystep
As I read the spec ActiveProgramEXT is a client state thing, just like ActiveTexture etc, so you don't need that when not using uniforms.
A vertex shader is not needed for this kind of stuff - this should save a bit more here than in the normal case. And finally it's usually a bad idea to do that loop for the extension loading (that will apply to the normal case too ofcourse).
So I would write like:
Code:#define glCreateShaderProgramEXT ((PFNGLCREATESHADERPROGRAMEXT)wglGetProcAddress( "glCreateShaderProgramEXT")) #define glUseShaderProgramEXT ((PFNGLUSESHADERPROGRAMEXT)(wglGetProcAddress("glUseShaderProgramEXT"))) static __inline void setShaders() { GLuint f = glCreateShaderProgramEXT(GL_FRAGMENT_SHADER, fsh); glUseShaderProgramEXT(GL_FRAGMENT_SHADER, f ); }
added on the 2009-10-27 14:29:09 by Psycho Psycho
btw string pooling doesn't really have any pros.
added on the 2009-10-27 14:30:59 by decipher decipher
unless you have a ton of duplicate strings :) but no real cons either, or am i overlooking something?
added on the 2009-10-27 14:34:41 by superplek superplek
Quote:
@lx: Trust me, well written C code will at least be as good as what you can write in "pure Assembly". Ask Ferris.


well, my c code for rudebox was quite well written and packed good (and wasnt that much anyway)

for the final i converted everything to assembly and got down from 4091 bytes to 4008 bytes.
subtracting the ~15 bytes that i could gain with some music tweaks this still leaves an advantage of ~70 bytes for the asm version compared to the c version ( and that is not even mentorized asm ;) )

still, pretty much and for sure worth it for a 4k.
for a 1k it's even more important since the packer has less data to find similarities with.
added on the 2009-10-27 15:22:17 by gopher gopher
gopher, yep, not to mention also that 700 to 900 bytes are occupied by your softsynth 4klang... it would'nt have been possible to do it in C as such a level. To be fair and more realistic, the 70 bytes gain should be compared, not with the total size on the intro, but the size of the c code... i expect to see a better ratio here.

By the way, looking at rudebox code, why did you use this sequence:
Code:mov eax,[COMOBJECT] mov eax,[eax] push dword [COMOBJECT] call dword near [eax+VFUNCTION]


instead of this one?
Code:mov eax,[COMOBJECT] push eax mov eax,[eax] call dword near [eax+VFUNCTION]


the push eax seems to be a bit better compressed after the mov, eax[...], and because crinkler use 8 bytes from the context, i will even be more efficient if you have the whole thing : mov eax,[COMOBJECT] / push eax / mov eax,[eax] in less then eight bytes...)... but probably, in the end, if your whole code is like this, it's not making a huge diff...
added on the 2009-10-27 15:46:55 by xoofx xoofx
is that from the final version of rudebox (or better the compressed final)?

actually in the final version there should be mostly calls like that:
Code: push edi call dword [ebp+VFUNCTION]

where the comobject ist stored in edi and the interface is stored in ebp once before such call sequences start.

i only swap com objects in edi/ebp if needed (which is for the D3DXEFFECT and IDIRECT3DDEVICE objects only anyway)
added on the 2009-10-27 16:02:03 by gopher gopher
Quote:
is that from the final version of rudebox (or better the compressed final)?

Oops, no, i had still the old party version... you changed quite a lots of things in the final...
Although i have a weird result here:
- party version : uncomrpessed code : 4026 bytes, compressed code size : 1591 bytes
- final version : uncompressed code : 3384 bytes, compressed : 1605

I'm wondering what did you do in between to get such a result? 600 uncompress bytes gain for a loss of 14 bytes in compressed form... (seems you cut down the hash size from 350Mo to 200Mo.... but not sure?)... That's not a big deel, you are still below 4096 anyway! ;)
added on the 2009-10-27 16:25:44 by xoofx xoofx

login