pouët.net

chunky to planar

category: code [glöplog]
YES! this is what coding was about in the golden ages, working around all the little annoying technical limitations :) \o/
added on the 2006-09-15 17:57:50 by Oswald Oswald
yeah, nowadays coding is completly different... *yawn*
added on the 2006-09-15 18:13:24 by Hatikvah Hatikvah
burburry: Doing C2P to fastmem and subsequently copying to chipmem while rendering is easy for effects like rotozoomers and tunnels, but doesn't it get impossible to manage if most of the rendering is done by say a texturemapper? And wouldn't it mess up and get messed up by the cache?

winden: Depriving people of sleep is my number one priority. Anyway, here's something resembling specs:

The code is huge. Really. It's big. It's not fully implemented, because there's so much code to write. So it's not suited for intros, because the compiled routine will be larger than 64k, and it won't crunch all that well either.

Speed will be about 10-30% above copyspeed, and the speed relative to regular C2P will be highest on the 060.

Enough hints for today, though. Stingray will disassemble one of my sources soon enough and then you'll know :D
added on the 2006-09-15 18:22:14 by doomdoom doomdoom
Well well, I've never disassembled sources yet, I should give it a try. :D
added on the 2006-09-15 18:34:33 by StingRay StingRay
\o/ Even my sources are just animations. Try it. Really, disassemble one and you'll find an animplayer and a lot of data.
added on the 2006-09-15 18:41:06 by doomdoom doomdoom
acctually all of gosubs sourcecode are compiled into sourcecode
added on the 2006-09-15 18:47:58 by Hatikvah Hatikvah

So your new & improved c2p method is to encode an anim into move.l #$12345678,(a0)+ at full speed??? ;)

I lately got a rotozoom + floyd-steimberg dither + c2p to chipmem routine going all cached, by drawing rotozoomed scanlines and dithering + c2ping just after 32 pixels are calced...

two versions available:

256->16 gray http://www.speakerpixel.com/gallery/showphoto.php/photo/128/sort/1/cat/500/page/1

256->2 gray, but expanding from 320 to 1280 pixels at dither time: http://www.speakerpixel.com/gallery/showphoto.php/photo/129/sort/1/cat/500/page/1

(gotta promote network' bg-party with quality stuff, you know)
added on the 2006-09-15 18:50:31 by winden winden
Quote:
So your new & improved c2p method is to encode an anim into move.l #$12345678,(a0)+ at full speed??? ;)


This kind of reminds me of all the new-school c64 effects since Oxyron invented 4x4 ;)
added on the 2006-09-15 18:55:40 by kb_ kb_
winden: :D That's a GREAT idea. \o/
added on the 2006-09-15 19:06:53 by doomdoom doomdoom
hehehe, anyways about cache trashing... I recall toying on 030 with disabling datacache for texture-pages by using mmu, so that they would not displace other stuff... maybe the same would work ok on 060? it's not like the OS is going to mess our stuff, so we better take advantage of it...
added on the 2006-09-16 00:43:30 by winden winden
I've never used the MMU for anything, so I don't know if that's even possible. But it sounds interesting.
added on the 2006-09-16 01:08:24 by doomdoom doomdoom
Quote:
Doing C2P to fastmem and subsequently copying to chipmem while rendering is easy for effects like rotozoomers and tunnels, but doesn't it get impossible to manage if most of the rendering is done by say a texturemapper? And wouldn't it mess up and get messed up by the cache?


Funny you should use the word 'impossible'. ;-)

For rotozoomers, tunnels and other one-scanline-at-a-time effects, a better strategy is to interleave the effect and the c2p, rendering one scanline (or some such) at a time. This has the added benefit of keeping the chunky buffer inside the data cache. I guess this is the only way to go for oneframe "full"-screen chunky effects.

The fast-to-fast + copy approach is actually well suited for more complex effects, since all it needs is three registers allocated for two pointers and a counter, and then you just scatter copy instructions throughout your code. Very bold coders can do without the counter. :)

The main cache issue is probably that if you get a cache miss shortly after a chip copy instruction has been executed, you might run into a quite nasty stall waiting for the write buffer to finish the chip write before the cache miss can be resolved. Careful placement of the copy instructions can amend this somewhat.
added on the 2006-09-19 20:11:21 by Blueberry Blueberry
also keep in mind that god invented s-buffers for a reason.
added on the 2006-09-19 21:22:20 by kusma kusma
Using a Superhires ham8 screen (1280 wide) to produce a 1x1 (320) truecolour mode will cause the chipmem to go 3 times slower. To speed up the chipwrites it's possible to render the effect in fastram while the chipmem is busy displaying the screen. c2p convertion is done in the rasterlines that doesn't display the screen.
.
As far as I can remember, for Hires screen ham8 (640) chipmemwrites where around 2 times slower.
.
I made a 12bit and 15bit scrambled c2p that went copyspeed on the 030 for more information:

http://membres.lycos.fr/amycoders/opt/fasttruec2p.html
added on the 2006-09-19 21:50:04 by sp^ctz sp^ctz
chunky to planar me beautiful =)
added on the 2006-09-19 21:51:09 by Oswald Oswald
ha! that' nothing! When *I* was young I made an
AmigaIFF viewer in ARM2 assembly that could render pics
with any number of bitplanes to 8bpp chunky with realtime
palette lookup at 300 fps -- while drinking tea in a shoebox in the middle of the road.

Now try telling this young coders of today, and they won't believe you.
added on the 2006-09-20 04:11:24 by mrhill mrhill
I'd guess that the code Doom talks about will resemble fullscreen coding on the st :)
added on the 2006-09-20 10:23:36 by すすれ すすれ
Quote:
The code is huge. Really. It's big. It's not fully implemented, because there's so much code to write. So it's not suited for intros, because the compiled routine will be larger than 64k, and it won't crunch all that well either.

So actually all the time is spent in instruction cache misses? :)
added on the 2006-09-20 10:38:59 by Blueberry Blueberry
mrhill: remember.... palette lookups? :)
added on the 2006-09-20 11:57:17 by ryg ryg
Burlbarry: yes, I basically just unrolled a regular C2P routine. To make it faster than copyspeed i replaced all the MOVEs with FMOVE (fast move). Of course then you have to use floating pointers, and I'm having some trouble with that now.
added on the 2006-09-20 18:51:14 by doomdoom doomdoom
Dumiris: I would rather not try to imagine what your bit merge operations on floating pointers look like. :-O
added on the 2006-09-20 19:00:26 by Blueberry Blueberry
Can't you just *generate* that routine at runtime ?
added on the 2006-09-20 20:11:06 by Moerder Moerder
killer: no, not really.
added on the 2006-09-20 20:32:52 by doomdoom doomdoom
hmmm one thing that should speed up cpubased c2p on 000 machines is to ullscreen-unroll the c2p loop and convert the data loading from pointer addressing to immediate loads... then the screen draw routine would paint into this buffer instead of a separate screen buffer... not really usable on 060 tough...
added on the 2006-09-20 23:52:43 by winden winden
how about just using an optimizing compiler, lol?
added on the 2006-09-21 02:03:25 by kusma kusma

login