pouët.net

phöng shading and clipping?

category: general [glöplog]
A question for you 3D experts.

I'm working on some software rendering stuff for an (last) assigment and have problem understanding how to calculate the interpolated normals for phong lighting when I do clipping.

My rendering pipeline looks something like this (basically the OpenGL pipeline):
---> Object/model space -------------------------------------------
1. Transform normals with inverse of model-view matrix.
2. Transform vertices with model-view matrix.
---> Homogenous view/eye/camera space -----------------------------
3. Backface culling with dot product (Possibility #1).
4. Calculate lighting for lambert/gouraud shading.
5. Transform vertices with the projection matrix.
---> Affine device or clip coordinates ----------------------------
6. View frustum culling + clipping.
8. Do homogenous/perspective divide.
---> Normalized device coodinates ---------------------------------
9. Backface culling with z-value of normal (Possibility #2).
10. Apply screen space transform.
---> Screen space/Window coordinates ------------------------------
11. Rasterize. Apply lighting, Interpolate colors etc.
12. Texturing, fog.
13. Stencil test.
14. Depth test.
15. Alpha blending.

Question 1: Is there something completely wrong with this?!

Now when I do flat or smooth shading I calculate the shade values in step 4. For phong shading I can not do that, as I need to calculate them in step 11 for every pixel I draw.

Question 2: Can/Should/Do I interpolate shade values/normals for clipped lines/triangles in clip-space? Will this give the right results?

When I'm in step 11 and need to do lighting (point lights, spot lights), I will need the vertex position in eye space for that. But this is lost when doing the multiply with the projection matrix and clipping. I can't use the ones from step 4, because those are not clipped. Will I need to backproject the clipped coordinates with the inverse perspective matrix?

Question 3: How do I set up rendering in the last steps (11). What vertices, coordinates/values do I actually need?

I don't know if I'm making sense here. Tell me when I don't. ;)

Oh, yeah and I bought some books and read them, I tried to use google, but still there's just some things I haven't quite figured out yet...
added on the 2007-08-15 14:25:20 by raer raer
on question 1:

step 1: you need to transform normals with the transpose of the inverse model-view matrix (or inverse of the transpose, the two are identical). short derivation: assume you have a point x, a transform matrix A and y=Ax (=> x=A^-1 y). now say you have a plane in the original space given in the standard form with normal and distance: <x,n>=d. (<.,.> being the standard inner product). you then need the equation for the plane in the transformed space, so let's plug it in:

Code: <x,n> = d <=> x^T n=d <=> (A^-1 y)^T n = d <=> (y^T A^-T) n = d <=> y^T (A^-T n) = d <=> <y,A^-T n> = d.


(A^-T = (A^-1)^T = (A^T)^-1).

step 3: don't do it here, because...
step 9: don't do it here either. do it AFTER the screenspace transform, and not using the z-component of the face normal (presumably still floating point), but using your integer (though hopefully subpixel-accurate) screenspace coordinates. two reasons: a) due to rounding to subpixel coordinates, the results using a "perfect" test in object space may not correspond to the triangles your rasterizer sees in very close edge-on cases, which can screw you over when you need watertight rasterization (as opengl/d3d indeed require), and b) you get to cull triangles that are degenerate in screen space (e.g. two edges parallel to each other) after subpixel snapping, even if they may not be in world/clip space.

the savings from culling individual triangles early are tiny since in a opengl-style pipeline, all the work happens on vertices and not faces; you can only skip a vertex if none of the triangles that use it get drawn, and tracking that is a lot of trouble and typically means more work than you end up saving by it, so don't.

steps 13/14: while depth/stencil tests conceptuallly happen quite late in a pipeline, you want to do them as early as possible; certainly before doing texture sampling or expensive shading (such as phong).

----

question 2: yes, it will. you want all those values to be linear in world (i.e. 3d space). due to homogenous coordinates, the perspective matrix is a linear transform, so clip-space, eye-space and world-space all have a linear relationship with each other and everything that is linear in one stays linear in the other. it only gets nonlinear AFTER the perspective divide in step 8. in fact, that's why you clip before projection, not afterwards (i.e. during rasterization) - before, everything is simple, nice and linear. it only gets tricky afterwards.

again, this is how it conceptually looks; some hw (e.g. nvidia) actually DOES clip during rasterization. but when you do this, you need to take care to perform perspective correct interpolation of all your interpolated values (like what you do for perspective correct texture mapping). old sw 3d engines that only did linear interpolation (and affine, non-perspective-correct tmapping) thus preferred to clip in 3d space; nowadays, you probably want perspective correct interpolation anyway (you definitely do in hardware), so it's just a design decision of how big to make certain registers (when you don't clip beforehand, you need to allow fairly large coordinates during triangle setup) and not a fundamental problem.

question 3: each triangle needs the vertices it uses (obviously), with all the components it needs for texturing/shading. i'll assume 2d textures here. with a opengl-style pipeline with 1 set of 2d texture coordinates and one interpolated color (gouraud shading), you need to interpolate (linearly): s*w,t*w,r*w,g*w,b*w,a*w,z*w. s,t=texture coords, r,g,b,a=colors and w=w of your vertex after projection and perspective divide (usually =1/z' where z' is your pre-projection z, assuming you use a perspective projection). you can interpolate them linearly because while s,t,r,g,b,a and z don't vary linearly in screenspace, s/z',t/z',... (and thus s*w,t*w,...) do. per pixel, you then calculate 1/interpolated_w, and using that, you can get the interpolated values for s,t, etc. from the interpolated values of s*w,t*w etc.

for multiple textures/colors, you need to interpolate several sets of texture coordinates/colors; for per-pixel phong shading, you need to interpolate normals and reflected direction too, etc.

also, in practice, you won't do the 1/w for every pixel, but for every few pixels (4 to 16 usually) and do linear interpolation inbetween. this isn't correct, but much faster (divisions are slow) and looks about the same. you need to be careful with z (not z'!) though, because you want that to be pretty good precision even when the rest is kinda sloppy, because it determines your object intersections and everything.

hope that helps :)
added on the 2007-08-15 15:17:52 by ryg ryg
Woah ryg! You da man ;)
Seriously. That helped a lot.

--> Normal transformation: In step 1 meant the inverse transpose actually, sry. Just a question on that. The transpose of the inverse of the model-view matrix should actually be the same as its transpose if you only use rotations and translations. What about UNIFORM scaling?

--> Backface culling: I thought backface culling should be done early, because of performance. I read what you stated about clipping in screen-space coordinates before though. I will need to do a CW/CCW check then, right?

--> Interpolating normals: Thanks for the explanation. I was already guessing that, but you made it clear.

--> Values for rasterization: I was asking that mainly because of lighting.
When I use spot lights or point lights I need to have the eye-space coordinate of the current pixel to calculate the distance or the angle to the center of the cone. right? So I'd have to interpolate that too...

And the tutorial from Kalm is really nice and way shorter that the stuff from Chris Hecker :) Thanks winden.
added on the 2007-08-15 16:56:10 by raer raer
Oh and you wrote that you can actually clip in screen space. I think you could run into accuracy problem with large coordinates though, so I'd first cull away primitives that are completely out of the viewport in step 6...

lol. and I noticed step 7 is missing :D
added on the 2007-08-15 17:34:24 by raer raer
rarefluid, as long as you only use rotations and translations, the inverse of the transpose of a matrix is the same as the matrix itself (not its transpose). if you use only uniform scaling with a factor s, then A^-1 = 1/(s^2) * A^T (A^T still scales by a factor of s, the inverse has to undo the scaling so it does 1/s, and (1/s)/s=1/(s^2)). in the uniform scaling+rotation+translation case, the best thing to use on the normals is actually (1/s)*A^T, because then the upper-left 3x3 submatrix is orthogonal so normals stay normalized and you don't have to renormalize them afterwards. the inverse of transpose thing only gets important when your transform matrix contains nonuniform scaling or skewing; this is the most general case for an affine transformation (i.e. no projection component), and you have to renormalize the transformed normals in that case.

backface cull: as said, nearly all of your work is done on vertices, not faces (except for triangle setup in rasterization, and backface cull is before that no matter what). so kicking out faces does, by itself, not help very much. you need to know which vertices you can get rid off. for individual triangles or quads this is easy, for fans and strips you need some lookahead (=bad), and for indexed primitives (doesn't matter which) you basically need a kind of reference counting scheme (=worse). that's a bunch of code and bookkeeping overhead. the way to get gains from this in a SW implementation is by using clustered backface culling where you basically store a cone representing the distribution of normals for a bunch of faces; it can be done hierarchically and can reject groups of faces easily, which makes it easier to amortize the additional overhead. you need to precompute those cones etc. though, so it doesn't work with a GL-ish API. besides, there are better ways to improve performance than optimizing backface culling - per-pixel work usually outweighs per-vertex work by a significant margin in a sw implementation, so the best course of action is usually to simplify it on an algorithmic level, try and keep it straightforward (nice clean bulk data-processing loops, easy to optimize :) while taking care not to make any stupid mistakes. all clever processing tends to have some weird corner cases that cause slowdowns/bugs/both and it nearly always means trading a small to medium improvement in average-case performance for a medium to big loss of worst-case performance, which is a bad thing for a library where you don't know what the user will end up doing.

values for rasterization: yep. which is the way you implement it in shaders, and also the reason you never had per-pixel lighting in standard fixed-function gl/d3d implementations.
added on the 2007-08-15 17:53:10 by ryg ryg
ok, I'll get to work now. :)
thanks for explaining.
added on the 2007-08-15 18:06:16 by raer raer
Another note about early backface culling: object-space back-face culling break for all non-linear transforms (hello skinning), and isn't really that much faster than the following scheme:
- transform all vertex position
- backface/frustum cull all triangles
- for each vertex in the non-culled vertices, calculate "expensive" attributes like lighting, preferably with a simple software-cache to reduce re-calculations.
- clip triangles if intersecting the frustum
- draw
added on the 2007-08-15 18:16:32 by kusma kusma
"Oh and you wrote that you can actually clip in screen space": In fact, the way this happens with most hardware is very simple. Hardware rasterizers are usually based on the so-called Pineda algorithm (the only exception I know for sure is the PS2, which uses a Bresenham/DDA scan-line based rasterizer). The Pineda algorithm basically tests for each pixel whether it's "inside" of the 3 edges of the triangle - so it involves evaluation of 3 linear functions per pixel. This is more arithmetic than with normal scan conversion, but it's trivial to parallelize, makes it easy to do a rough pre-rasterization in big blocks (think early/hierarchical Z testing), makes the scissor test and x/y-clipping a COMPLETE no-brainer (you query each pixel individually, and you simply don't ask for pixels outside the scissor rectangle!), turns z-clipping into a simple Z-scissor test (if you indeed do it; since GF4 I think, NVidia has the depth clamp extension which basically allows you to turn it off altogether; this wouldn't be possible if the card did a z-clip beforehand!) and yields very fast, essentially branch-free hardware implementations. A nice paper on a scan-conversion algorithm for hardware using this approach is

http://www.cs.unc.edu/~olano/papers/2dh-tri/2dh-tri.pdf

I believe all current PC graphics hardware uses this; the only differences are the size of screen-space coordinates the rasterizer can work with. NV hardware usually has an "infinite guard band" which means the rasterizer can eat whatever the transform units output, whereas ATI hardware has a guardband roughly an order of magnitude bigger than the expected screen resolution (-32768...32767 on my Radeon X1600) so the majority of clipping work is handled by the guard band and actual triangle clipping is only necessary when a huge triangle both leaves the guard band area and intersects the visible screen area, a case that is exceedingly rare and presumably exercises a very slow path in the hardware when it actually occurs :)
added on the 2007-08-15 18:26:08 by ryg ryg
Theres actually a nice article about the edge function/halfspace approach here at devmasters.
added on the 2007-08-15 18:50:35 by raer raer
ryg: the paper you suggested might also be very well suited for SSE code. as is the halfspace approach.
added on the 2007-08-15 19:00:34 by raer raer
SIMD I meant.
added on the 2007-08-15 19:15:00 by raer raer
I don't trust it.
Quote:

as long as you only use rotations and translations, the inverse of the transpose of a matrix is the same as the matrix itself (not its transpose).

What do you mean? The inverse of the transpose of

Code: ( 1 0 0 x ) ( 0 1 0 0 ) M = ( 0 0 1 0 ) ( 0 0 0 1 )


is not M.
added on the 2007-08-16 09:43:42 by Hyde Hyde
... unless x = 0 of course.
added on the 2007-08-16 09:44:26 by Hyde Hyde
You can invert any orthogonal matrix (determinant == 1) by transposing it. Transposing twice obviously brings you back to the original matrix.
added on the 2007-08-16 10:32:58 by raer raer
rarefluid: An orthogonal matrix is a matrix where the column vectors are pairwise orthogonal and the length of each column vector is equal to 1. If you do not fulfill the length-condition (as you propose) you cannot expect the transpose to be equal to the inverse.
added on the 2007-08-16 10:41:23 by Hyde Hyde
And I think I understand what Ryg is saying. Your applications are for rotating normal vectors, so in the end you only care about the upper-left 3x3 matrix containing the rotation part of your matrix anyway. Hence, you simply ignore the rest, and in effect you are only considering the orthogonal 3x3-sub matrix, in which case it is true that inverse = transpose.
added on the 2007-08-16 10:46:09 by Hyde Hyde
hyde: yep, it doesn't hold for the whole matrix, but for normals you only care about the upper left 3x3 submatrix anyway.
added on the 2007-08-16 11:42:27 by ryg ryg
Phong clipping?
added on the 2007-08-16 13:53:19 by doomdoom doomdoom
*revives n00b 3D rasterization thread*

Maybe this is a silly question but to be perspective correct shouldn't ALL attributes be interpolated like texture coordinates? I've never seen anyone do this with colors actually, which made me wonder...
Or ist it that it's just not visible with colors?
added on the 2008-01-28 12:39:30 by raer raer
It's much less obvious with colours
So 1/z interpolation would be right, but it is not visible anyway, so just interpolate linearly. mkay then...
added on the 2008-01-28 13:54:52 by raer raer
But if you do edge clipping before projection, doesn't that leave you with potential precision issues? And doesn't that mean you risk writing outside the screen buffer unless you also add boundary checking in the rasteriser?
added on the 2008-01-28 14:57:09 by doomdoom doomdoom
As for faster culling, my experience is you get the most significant speedups from the more high-level stuff, like culling groups of faces (and related vertices) in world space, before any transformations, based on whether or not each group's bounding box/sphere is fully outside the view frustrum. That's obviously easier with static geometry, but you can do some grouping while building dynamic meshes.

Also a favourite of mine, is to prepare the inner and outer radii of each group, so that the outer radius corresponds to the bounding sphere, and the inner radius defines the region that's guaranteed to be opaque from whatever angle you're viewing the group. If you then trace a cone from the camera that just touches the inner sphere, it's straightforward to see which other groups are obscured by it.

Depending on how the scene looks, the speedups from this sort of thing can be stupendous.
added on the 2008-01-28 15:16:34 by doomdoom doomdoom

login