pouët.net

mmx vs 4x4 mats

category: general [glöplog]
 
So I want to do a matrix4x4 * matrix4x4 with mmx instructions. Seems the first thing I have to do is flip the rows and the columns on one matrix to match the vector-math order of the other. Any tips on how to quickly flip a matrix like this? All the solutions I can think of involve too many of the clocks. Do we even need to do this? How do you do it?
added on the 2010-03-16 04:17:27 by sigflup sigflup
I guess you mean SSE, not MMX.

If you do _one_ mat4x4*mat4x4 multiplication with SSE it will be slower than with regular code, because of the overhead of formating the data to fit the sse constrains. Instead, SSE is faster than regular code when you do lots of mat4x4*mat4x4 multiplications sequentially. Then, you do them in the regular way, but with four matrices at a time.

Basically using code parallelism is the wrong approach - using data parallelism is the good one.

The problem is that to exploit the SSE you most probably need to rearrange your data structures if not the application/module architectur and design it for SSE from scratch.

How many matrix multiplications do you have do to?
added on the 2010-03-16 04:28:03 by iq iq
not many 4x4*4x4s, a whole lot of 4x4*4x1s. Fortunately 4x4*4x1s is a lot easier to deal with and that would be the bottleneck of them anyway. It just kinda bugs me that there isn't a good way to vectorize 4x4*4x4, at least it would seem. For me this would seem like something I would want to do, the sh3 has matrix multiply instructions, unfortunately for dos we are not so lucky :(
added on the 2010-03-16 04:40:14 by sigflup sigflup
for the 4x4 * 4x1, assuming that is transforming the vertices of a mesh, then SSE can help if you have a lot of vertices. Feed the transform routine with 4 verts at a time and do the regular matrix*vector thing. Some prefectching can help probably too (I forgot the instruction, I think it was _mm_prefetch() or something).
added on the 2010-03-16 04:44:28 by iq iq
all depends, depends. if your data is aligned its very doable. but for 1 mul don't bother, use sse for batch processing of stuff and so on.

good: time to sleep.
added on the 2010-03-16 04:51:04 by superplek superplek
alright- I guess I'll drop it then for 4x4*4x4.
added on the 2010-03-16 04:56:36 by sigflup sigflup
Lame.
added on the 2010-03-16 08:07:24 by 24 24

login