chunky to planar
category: code [glöplog]
YES! this is what coding was about in the golden ages, working around all the little annoying technical limitations :) \o/
yeah, nowadays coding is completly different... *yawn*
burburry: Doing C2P to fastmem and subsequently copying to chipmem while rendering is easy for effects like rotozoomers and tunnels, but doesn't it get impossible to manage if most of the rendering is done by say a texturemapper? And wouldn't it mess up and get messed up by the cache?
winden: Depriving people of sleep is my number one priority. Anyway, here's something resembling specs:
The code is huge. Really. It's big. It's not fully implemented, because there's so much code to write. So it's not suited for intros, because the compiled routine will be larger than 64k, and it won't crunch all that well either.
Speed will be about 10-30% above copyspeed, and the speed relative to regular C2P will be highest on the 060.
Enough hints for today, though. Stingray will disassemble one of my sources soon enough and then you'll know :D
winden: Depriving people of sleep is my number one priority. Anyway, here's something resembling specs:
The code is huge. Really. It's big. It's not fully implemented, because there's so much code to write. So it's not suited for intros, because the compiled routine will be larger than 64k, and it won't crunch all that well either.
Speed will be about 10-30% above copyspeed, and the speed relative to regular C2P will be highest on the 060.
Enough hints for today, though. Stingray will disassemble one of my sources soon enough and then you'll know :D
Well well, I've never disassembled sources yet, I should give it a try. :D
\o/ Even my sources are just animations. Try it. Really, disassemble one and you'll find an animplayer and a lot of data.
acctually all of gosubs sourcecode are compiled into sourcecode
So your new & improved c2p method is to encode an anim into move.l #$12345678,(a0)+ at full speed??? ;)
I lately got a rotozoom + floyd-steimberg dither + c2p to chipmem routine going all cached, by drawing rotozoomed scanlines and dithering + c2ping just after 32 pixels are calced...
two versions available:
256->16 gray http://www.speakerpixel.com/gallery/showphoto.php/photo/128/sort/1/cat/500/page/1
256->2 gray, but expanding from 320 to 1280 pixels at dither time: http://www.speakerpixel.com/gallery/showphoto.php/photo/129/sort/1/cat/500/page/1
(gotta promote network' bg-party with quality stuff, you know)
Quote:
So your new & improved c2p method is to encode an anim into move.l #$12345678,(a0)+ at full speed??? ;)
This kind of reminds me of all the new-school c64 effects since Oxyron invented 4x4 ;)
winden: :D That's a GREAT idea. \o/
hehehe, anyways about cache trashing... I recall toying on 030 with disabling datacache for texture-pages by using mmu, so that they would not displace other stuff... maybe the same would work ok on 060? it's not like the OS is going to mess our stuff, so we better take advantage of it...
I've never used the MMU for anything, so I don't know if that's even possible. But it sounds interesting.
Quote:
Doing C2P to fastmem and subsequently copying to chipmem while rendering is easy for effects like rotozoomers and tunnels, but doesn't it get impossible to manage if most of the rendering is done by say a texturemapper? And wouldn't it mess up and get messed up by the cache?
Funny you should use the word 'impossible'. ;-)
For rotozoomers, tunnels and other one-scanline-at-a-time effects, a better strategy is to interleave the effect and the c2p, rendering one scanline (or some such) at a time. This has the added benefit of keeping the chunky buffer inside the data cache. I guess this is the only way to go for oneframe "full"-screen chunky effects.
The fast-to-fast + copy approach is actually well suited for more complex effects, since all it needs is three registers allocated for two pointers and a counter, and then you just scatter copy instructions throughout your code. Very bold coders can do without the counter. :)
The main cache issue is probably that if you get a cache miss shortly after a chip copy instruction has been executed, you might run into a quite nasty stall waiting for the write buffer to finish the chip write before the cache miss can be resolved. Careful placement of the copy instructions can amend this somewhat.
also keep in mind that god invented s-buffers for a reason.
Using a Superhires ham8 screen (1280 wide) to produce a 1x1 (320) truecolour mode will cause the chipmem to go 3 times slower. To speed up the chipwrites it's possible to render the effect in fastram while the chipmem is busy displaying the screen. c2p convertion is done in the rasterlines that doesn't display the screen.
.
As far as I can remember, for Hires screen ham8 (640) chipmemwrites where around 2 times slower.
.
I made a 12bit and 15bit scrambled c2p that went copyspeed on the 030 for more information:
http://membres.lycos.fr/amycoders/opt/fasttruec2p.html
.
As far as I can remember, for Hires screen ham8 (640) chipmemwrites where around 2 times slower.
.
I made a 12bit and 15bit scrambled c2p that went copyspeed on the 030 for more information:
http://membres.lycos.fr/amycoders/opt/fasttruec2p.html
chunky to planar me beautiful =)
ha! that' nothing! When *I* was young I made an
AmigaIFF viewer in ARM2 assembly that could render pics
with any number of bitplanes to 8bpp chunky with realtime
palette lookup at 300 fps -- while drinking tea in a shoebox in the middle of the road.
Now try telling this young coders of today, and they won't believe you.
AmigaIFF viewer in ARM2 assembly that could render pics
with any number of bitplanes to 8bpp chunky with realtime
palette lookup at 300 fps -- while drinking tea in a shoebox in the middle of the road.
Now try telling this young coders of today, and they won't believe you.
I'd guess that the code Doom talks about will resemble fullscreen coding on the st :)
Quote:
The code is huge. Really. It's big. It's not fully implemented, because there's so much code to write. So it's not suited for intros, because the compiled routine will be larger than 64k, and it won't crunch all that well either.
So actually all the time is spent in instruction cache misses? :)
mrhill: remember.... palette lookups? :)
Burlbarry: yes, I basically just unrolled a regular C2P routine. To make it faster than copyspeed i replaced all the MOVEs with FMOVE (fast move). Of course then you have to use floating pointers, and I'm having some trouble with that now.
Dumiris: I would rather not try to imagine what your bit merge operations on floating pointers look like. :-O
Can't you just *generate* that routine at runtime ?
killer: no, not really.
hmmm one thing that should speed up cpubased c2p on 000 machines is to ullscreen-unroll the c2p loop and convert the data loading from pointer addressing to immediate loads... then the screen draw routine would paint into this buffer instead of a separate screen buffer... not really usable on 060 tough...
how about just using an optimizing compiler, lol?