Amiga 500: New blitter c2p routine implemented. Need help to speedtest

category: general [glöplog]
For more info:

http://ada.untergrund.net/forum/index.php?action=vthread&forum=4&topic=2 17
Expect 7mhz release from me on breakpoint. :-D
Come on old coders! 2 months to go.. enough time to make something interesting.
added on the 2007-02-17 12:32:14 by sp^ctz sp^ctz
Remove the space before =217 in the url. when cut'n paste.
added on the 2007-02-17 12:33:50 by sp^ctz sp^ctz
great. I just got a 1200 here, my a600 is broken. I'll take a look, I've read the thread, amiga leet coders are back !

added on the 2007-02-17 13:01:53 by krabob krabob

If you want to code an entire demo, a real A500 may be a good investment..

50SEK on ebay..
added on the 2007-02-17 15:48:08 by Stelthzje Stelthzje
Fantastic! I'll check it on an A500 one of this days.
Newskool style demos on oldskool hardware? ;)
added on the 2007-02-17 17:22:32 by ham ham
@sp_: any chance you can post that assembler code somewhere where you don't have to have an (yet another unwanted) account on a (yet another uneeded) file sharing server?

I just finished repairing the low-byte octal transceiver on my A4000 (I hate SMD's) so I have my fastmem back again (yay!) and may have some time to dig out one of my A500's and try that code.
added on the 2007-02-17 23:59:18 by jima jima
Newschool :-) I watched some of the newest Atari ST demos wich where pretty cool. Should be possible to do bether on amiga.
I live in Thailand, 50SEK is 5 days of food here :-D WinUAE have an a500 speed option. If the timing on the realhardware is close to winuae timings I won't buy one.


I've included a CIA timer to make timing simple. I've set the blitter nasty bit. if possible I want timing withoutblitter nasty, and with 4 bpl (not 5 as in the source)
added on the 2007-02-18 02:39:05 by sp^ctz sp^ctz
sp^ctz ... winaue timing is quite ok compared to a real a500... the last newschool thing i did the difference between a real a500 and winaue was just a few rasterlines...

btw so far you are interested... have a look at: http://www.pouet.net/prod.php?which=26920 there i'm using atari st c2p...
added on the 2007-02-18 04:34:54 by ultra ultra
Winuae timings
(use a500speed cpu setting, no JIT):

320*256(2x1) res in 4 bpl(scroll to emulate 4bpl):

196 rasterlines (ABCD read:320*256*4/8, write:320*256*2/8 bytes)


101 rasterlines.(read: 320*256*4/8, write:320*256*2/8 bytes)


196+101 = 297 rasters.

One frame in PAL is around 312 rasterlines.
According to UAE timings a "table effect" like tunnel, bump etc. can be done in 50fps in a
160*256 resolution 2x1 4 bpl.

Sounds to good to me :-D
Ready to break some world records guys?
added on the 2007-02-18 09:39:41 by sp^ctz sp^ctz
I uploaded a new version now. this verison renders a 160*256 4bpl chunkybuffer and perfoms a blitter c2p. I move red color into $Dff180. and black when finished. If the effect goes 50fps. the red color will not be blinking. (winuae use500 speed is not blinking indicating 50 fps.)
added on the 2007-02-18 10:06:15 by sp^ctz sp^ctz
hm that sounds a bit too fast... i mean
when you need for example 12 clocks for a point(the offset access without c2p) then you have
160*256 points to calc... == 40960 * 12 = 491520 clockcycles...
491520 / 456 (456 cycles per rasterline so far i remember) == 1077,89 rasterlines == around 3,4 vbl...
so a tunnel is for for sure 50 fps in that size...
and this is without c2p !
added on the 2007-02-18 10:24:37 by ultra ultra
so a tunnel is for sure not 50 fps in that size... i mean ;)
added on the 2007-02-18 10:26:17 by ultra ultra
I agree that the WINuae timings sound to good to be true.
How many cycles? This is an SMC loop that will render 8 pixels in a scrambled nibblechunky buiffer.

lea txture1,a0
lea txture2,a1
lea chunky,a2

REPT 160*256/8
move.b 0000(a0),d6
or.b 0000(a1),d6
move.b d6,(a2)+
move.b 0000(a0),d7
or.b 0000(a1),d7
move.b 0000(a0),d6
or.b 0000(a1),d6
move.b d6,(a2)+
move.b d7,(a2)+
move.b 0000(a0),d6
or.b 0000(a1),d6
move.b d6,(a2)+

added on the 2007-02-18 10:37:05 by sp^ctz sp^ctz
hm the loop needs so far my clockcycle list has correctly 132 cycles...
for a 256*160 it would be
20*256*132 = 675840 / 456 == 1482,10 / 312 == 4,7 vbl
so far i didn't do a misstake...

but the loop can be speeded up a bit by using 4 textures and doing .w accesses... oeh would save 18 clocks or so
added on the 2007-02-18 10:54:23 by ultra ultra
ok of course the blitter leeches some time too.. 2 sources and stuff...
added on the 2007-02-18 10:58:09 by ultra ultra
I get the loop to be 128 cycles. (when unrolled)
If I shrink the txture from 256*256 to 16*16 I can calculate two pixels pr move
and dobbel the speed. A tunnel effect can be 2 times faster by using the copper
and modulo (-80) ---> Tunnel 50fps 4bpl 160*200 (160x100) possible? I think so...

For 16x16 txtures(8 pixels 64 cycles) 16x16 txture:
REPT 160*256/8
move.w 0000(a0),d6 ;12
or.w 0000(a0),d7 ;12
move.w 0000(a1),d6 ;12
or.w 0000(a1),d7 ;12
move.w d6,(a2)+ ;8
move.w d7,(a2)+ ;8

I did a small test in winuae by moving colour to colorregister 456 cycles pr
rasterline seems correct on the a500
btw; I Checked out your intro. Pretty nice. I liked that u used the copper to make it look
like you used more bitplanes. 18 years :D welcome back!!
added on the 2007-02-18 13:26:09 by sp^ctz sp^ctz
ops, there is an error in the loop above. To quick there. :D
added on the 2007-02-18 13:35:24 by sp^ctz sp^ctz
a 320*200 tunnel 16*16 texture... 2*2 should be possible in 50fps...

but you can do it maybe a bit faster... in my tunnel i'm using a small trick... the tunnel could be 256 lines but i ran into memory problems so it's smaller...

when you have a look to the tunnel offsets... often offsets are the same (when you use the right tunneloffsetmap i mean) for same offsets you use a different texture... that saves some moves in the innerloop ok works only with unrolled stuff...

ok maybe will not give you that big speedup like in mine... because you already have 2 pixeles combined in one texture...but at the tunnel borders often pixels are the same... maybe that helps a bit...;)
added on the 2007-02-18 14:11:17 by ultra ultra
No i meen a 4bpl 2*1 tunnel in 320*200 50 fps. I will hint: I wil not able to rotate, only move in one axis Cheating: yeah. (but no color cycling. he-he)
Table effects are borring, i'm working on converting my 256byte txturemapper to plot in scrambled nibblechunky.
Intro for breakpoint.

Come on coders, challenge me there :D
added on the 2007-02-18 14:24:23 by sp^ctz sp^ctz
hm just calc ;)
64 * 20 * 200 / 456 / 312 == 1,79 vbl...

i planned to convert my mapper from st to amiga ... but hm nope i'll not do something for breakpoint... too less time ;)

but i'm looking forward to your bp thing ;)
added on the 2007-02-18 14:30:08 by ultra ultra
I'm looking forward to it too. This is REAL demo coding!
added on the 2007-02-18 18:21:56 by draft draft
yay for hardware banging! looking forward to bp. will you be there, sjeggtryne?
added on the 2007-02-19 14:39:43 by kusma kusma
In search for newschool a500 demos I found this one:

Rout by Potion 1999


Running in winuae with match a500 speed makes a really descent framerate.
The resolution is probobly 128*100 using a blitter c2p similar to mine.
Pretty impressive code.
Does it run on 1meg 512k chip / 512k fast amigas?

added on the 2007-02-20 06:42:27 by sp^ctz sp^ctz
hm @decent framerate..
with my winaue config it runs with 4-5 vbls (so far i could recognise) sometime much more... and only 8 colors... but anyway this is from 1999 cool for this time ...
added on the 2007-02-20 13:12:27 by ultra ultra
"Does it run on 1meg 512k chip / 512k fast amigas? "

sp^ctz: no, needs 1.5Mb minimum to run - can be all chipmem.
added on the 2007-02-20 17:00:54 by jima jima