Interpolation in adaptive sub-sampling?

category: general [glöplog]

texel:

Quote:

You are using cyclic textures, so, in one point you get texture coordinate 250 and in the other 3 (259%256), you will interpolate backwards, creating that effect.

i think you are right, but i do interpolate the plane uv's inside. the coords gets ugly when i increase grid size for the subsampler, it just increases the uglyness. i dont know if i understand you fully. i am thinking if it is possible too swap the u,v's if they are backwards, but i may be thinking wrong about that.

smash:
metaballs'll be my next thing, but fluids are too much too ask for. i have other challenges that keeps my time :p

added on the 2009-09-28 16:00:03 by rudi

but i see. if i understand you fully this is what happens:
and ill take the u-component of a coord as an example:

it its interpolating in two ways:

the correct one:
interpolates from 100 to 255 (wraps around to 0) and to 50.

the wrong one (and currently used):
interpolates from 100 to 50 (backwards) shows inverted texture at edges.

added on the 2009-09-28 16:15:01 by rudi

Yes rydi, that is what I meant.

So, I suggest to use floats in all the process, and then use the conversion to int just in the last stage, after interpolation, to get the texture from memory.

Just a question: why are you using chars and shorts and that?

I remember I used integers for everything few years ago, my reason was speed. I'm oldschool, and a lot of years ago (in PCs), integer math was faster than floats math, so I continued using it.

Then, all the demosceners told me why I was using that, that floats were faster. But in my tests I was not able to make them faster... until I discovered there are compiler options for fast floats.

So, remember to activate the fast floats in your compiler. The first C compiler I used, a very old version of VC6 (maybe from 99) had no options for that, it required a processor pack I never was able to install.

Then, in GCC there is the -msse2 -mfpmath=sse and in VC2008 the Enable Enhanced Instruction Set SSE2 and Floating Point Model Fast. Also, remember to use sqrtf instead of sqrt, sinf, cosf, and so on.

added on the 2009-09-29 16:24:27 by texel

I guess float versus integer is still prone to some discussion, even today. That is, for certain purposes :)

If you toggle SSE/SSE2 in VC, be very sure to adjust the float model to fast or fastest. Otherwise your functions potentially drown in SSE conversion ops and that costs a lot of cycles (cvt*) :)

added on the 2009-09-29 18:56:20 by superplek

texel:
i think that would be in some specific cases for demoeffects and the like i think. maybe that was fixedpoint math against floating-point, but i may be out of luck. i think it depends on what you're trying to do.

im using chars and shorts because i wanted to down the LerpInfo structure i have (the one that contains all the u,v, texflag, obj-id and so on). it saves me some fps and i think that is because of more cache-hits. i am going to optimize it even further by converting the recursive function (the one that samples down the squares) into iteration, because i read somewhere on the net that calling a recursive function is way much slower than to iterate. i knew that allready since returning a function takes some time. i have a feeling it will improve the speed a little more. all in all i need to change the algorithm i use for subdividing the squares so that it runs faster. if i manage to do it without ruining the whole code that is.

i activated fast floats and SSE2 in my compiler now and it saved me 5-6 fps. so thanks for that tip! i wondered why the sse2 normalization routine didnt run any faster than the regular one, but it seemed that i hadnt turned on that compiler option beforehand :P

oldschool is the new newschool \o/

added on the 2009-09-29 19:34:59 by rudi

Rydi:

For the recursive to iterative step:

From my last coding in a kd-tree traversing algorithm in GCC in my Core 2, I found that the mix of the compiler and computer makes a really good job with recursive functions.

Years ago (let say, 8 or 10), the recursive functions were a big slow down (for not very big functions, maybe 2x or 3x slowdown).

But, currently, it is not the case. When I made my kd traversing iterative (with a custom stack for that), it took me a look of time to optimize it just to reach the speed of the recursive function, and even more to gain something. The final speed up was maybe 20 or 30% only... so it looks that times are changing.

I just wanted to tell you, that even if you read something about optimization in tutorials or whatever, have in mind that what was fine few years ago, currently doesn't have to... or the speed gain could be very little.

What I believe that could make it faster, is that you do a fixed nunber of iterations, for example, 2 or 3 reflections. With this you will not need a stack, and that probably will make it faster... well, just try and let see.

added on the 2009-09-29 19:52:22 by texel

look of time=lot of time

added on the 2009-09-29 19:53:01 by texel

texel:
sure, i have that in mind, but since this is a software-renderer i need all the speed-gain i can get. that means i am allready considering that the speed gain will not be that much, but if some, then its better than nothing.
anyway, i know something else that is taking the most speed and that is the findQuadraticRoots function, and i am not in the mood to look at it. ive done so many times, but if i where to change it i had to explore more in the maths which i am not in the mood up to atm.

i dont think that 30 fps is good enough for 640x480 mode with 8x8 tracing and object-edge subsampling. thats currently where i am at.
thats with 3 spheres with reflection, diffuse and specular phong and a plane with texture. when i use the radius for the normalized colors it drastically reduces to the 1/4th in speed. :S

added on the 2009-09-29 20:42:16 by rudi

ah and i dont have a kd-tree yet :P

added on the 2009-09-29 20:43:19 by rudi

No, rydi, it doesn't seems very fast. What computer do you have?
Lets see... 4 objects - 3 spheres and 1 plane - multiplied by about 3 because of reflections and shadows... it is about 12 intersections per pixel. With the subsampling you might get about 4x speed up, so it is roughly 20-30 million intersections per second with shading and texturing...

Assuming you are using a single core and without SIMD, it looks not extremely fast, but not extremely slow. I believe you can speed it up 2 to 4x with the same conditions - no SIMD, one core, no kd-tree, if you have something like a pentium D 3 ghz or core duo 2 ghz... maybe.

And, do you know where is the bottleneck?

And... what is exactly "radius for the normalized colors"? Does it is something from Phong maybe?

Good luck with it!

added on the 2009-09-29 22:28:32 by texel

texel:
I have a Esprimo Mobile laptop with these specs:
Intel Pentium Dual CPU T3400 @ 2.16GHz (its one physical core but it says its 2 logical ones, whatever that means)
CPU architecture: IA32 (Family 6; Model 15; Stepping 13;)
and i think its 1024 KB L2 Cache, it says also a 1024 KB L1 Cache.

i tried VTune for the first time, which suggests that the bottleneck is in the Trace function, followed by LinearInterpolate(..), Intersection(..) and Render(..).. TraceBlock (which is my recursion) is just before main().

radius of the normalized colors is the one compared against the threshold, which i check every time the objID's of the corners are equal. with that turned off i still get only about 30 fps. it has nothing to do with phong. i removed the phong code i had earlier because this one is much more accurate in terms of the shadows and other stuff.

and also the Trace function exits after x number of traces in reflection or refraction.

i wont paste some code because its too big. but i can roughly tell what happens in pseudo:

Code:


** in main **
(loop)

for i = 0 to size do
  clear lerpBuffer[i].trace & colors
  clear pixelbuffer[i];
end

for x = 0 to width, x+= cellSize do
for y = 0 to height, y+= cellSize do
   TraceBlock(x, y, cellSize, lerpBuffer);

** end main **

** function TraceBlock **
calculate or init some values
calculates directionVectors on each corner.
if lerpBuffer[upperLeftCorner] is not traced then Trace(Ray...., lerpBuffer);
if lerpBuffer[upperRightCorner] is not traced then Trace(Ray...., lerpBuffer);
if lerpBuffer[lowerLeftCorner] is not traced then Trace(Ray...., lerpBuffer);
if lerpBuffer[lowerRightCorner] is not traced then Trace(Ray...., lerpBuffer);

if all object id's are -1 then do nothing {}
else
if all object id's are equal then
   LinearInterpolate(pixel, x, y, x + blockSize, y + blockSize, 
   check colors with threshold if true //subdivide into four smaller squares
   then
      TraceBlock(...) //small upperleft
      TraceBlock(...) //small upperright
      TraceBlock(...) //small lowerleft
      TraceBlock(...) //small lowerright
   end
end else //some object id's are different
      TraceBlock(...) //small upperleft
      TraceBlock(...) //small upperright
      TraceBlock(...) //small lowerleft
      TraceBlock(...) //small lowerright
end
end function

that's basically it.

added on the 2009-09-30 11:30:43 by rudi

There is something I don't understand:

Quote:

LinearInterpolate(pixel, x, y, x + blockSize, y + blockSize,
check colors with threshold if true //subdivide into four smaller squares
then
TraceBlock(...) //small upperleft
TraceBlock(...) //small upperright
TraceBlock(...) //small lowerleft
TraceBlock(...) //small lowerright
end

Why not this way:

If (thresold>whatever)
{
tracecornerblocks();
}
else
{
Interpolatecolor();
}

added on the 2009-09-30 17:31:37 by texel

Then, it looks that you are using 2 nested recursive functions, one for the subsampling algorithm and other for the trace... that looks slow...

I suggest you to try this to analyze results:

I suppose 640x480 resolution. Then:

Code:


for y=0; y<((640/4)+1); y++)
{
   for x=0; x<((480/4)+1); x++)
   {
      trace_ray(x*4, y*4);
   }
}
  
for y=0; y<((640/4)); y++)
{
   for x=0; x<((480/4)); x++)
   {
      if ((!all_objects_equal_in(x, y, x+4, y, x, y+4, x+4, y+4))||over_thresold())
      {
          for y1=y; y1<y+4; y1++)
          {
              for x1=x; x1<x+4; x1++)
              {
                   trace_ray(x1, y1);
              }
          }
      }
      else
      {
          interpolate();
      }
   }
}

added on the 2009-09-30 17:45:07 by texel

yes, its more likely that the recursion is causing the slowness. thats why i was talking about turning it into iteration a few messages back. your example can make a good reference for that :)
but when you interpolate do you have the position of the corners and maybe the block size in an own buffer? i was trying to remove that. well anyway, this will keep me a little bussy i think. but it looks good :)

added on the 2009-09-30 18:46:28 by rudi

Antiplanet is a raytracing game I recently was trying in my quad core that has worlds with too much spheres and the speed impressed me. http://www.virtualray.ru

added on the 2009-09-30 19:29:13 by Optimus

texel: i only got 3 fps with your code. it traces way to many times.

added on the 2009-09-30 19:55:46 by rudi

ummm... then maybe your trace function is veeery slow :P

But, you can do it the same iterative way for 8x8, then 4x4, then 2x2, then 1x1

added on the 2009-09-30 19:57:41 by texel

Optimus Knight: looks cool from screenshot.

added on the 2009-09-30 19:58:28 by rudi

texel: yep

added on the 2009-09-30 19:58:59 by rudi

shiva: are you still among us?

added on the 2009-10-01 17:50:52 by rudi

I now have a traceBuffer that is 9x9 in size. it's structure takes 8 bytes. I use 9x9 and not 8x8 because i need one extra element for the right and bottom side. if i used 8x8 the wrapping would have caused two traces on the same corner. Is 9x9 a bad size for a buffer like this? also i clear color and traceFlag on the 9x9 buffer before jumping to the next process which is quite slow. i just thought of maybe swapping the last corners with the new ones.

added on the 2009-10-02 13:47:35 by rudi

AntiPlanet is butt-ugly and so is AntiPlanet2. Worst advertisement for raytracing ever.

added on the 2009-10-02 14:14:22 by raer

pouët.net

Interpolation in adaptive sub-sampling?

login