pouët.net

Amiga 500: New blitter c2p routine implemented. Need help to speedtest

category: general [glöplog]
A500 = 512k + 512k
added on the 2007-02-20 18:47:21 by Zonkham Zonkham
A500+ = 1024K chipmem + 512K trapdoor (mine is chipmem).

It didn't run in 512K chip + 512K chip on an A500.
added on the 2007-02-20 21:17:53 by jima jima
I just tried it on a A500. It ran really slow.
added on the 2007-02-20 22:14:52 by Hatikvah Hatikvah
karlzzon: It's the 10-12.5 fps commonly known as "Amiga speed"
Hmm, has TBL-rate been standardized?
added on the 2007-02-20 22:43:18 by Hatikvah Hatikvah
stefan: yes.
added on the 2007-02-21 10:51:35 by kusma kusma
I rewatched the Potion a500 demo now. Looked much slower now..
I found out that when running winuae with use a500 speed and JIT cpu setting switched on produce very inaccurate results. :D (JIT has to be swithed off)
I entered the dissassembler in winuae (by pressing pg up). And found
out that this demo doesn't use SMC at all. the tableeffects where pr pixel:
REPT xxx
move.w (a0)+,d0
move.b (a1,d0.w),(a2)+
ENDR
The txturemapper had a 5 instruction addx innerloop probobly with 16:8 precition.
The buffer is a normal unscrambled byte chunkybuffer. probobly with a pure blitter c2p (3-merges). (3 times slower than mine...)
The resolution is 128*80. 2x2 scrollregister on xaxis and copper modulo stretched on y axis.
.
I've developed my c2p further now, but I wont release the source until I release a prod. :D by swapping bit 2 and 3 in the txtures and adjusting the colormap, the garbage pixel is now with 2 bit precition. making the picture brighter, and look smoother. One 2x2 pixel now consist off 2 "correct pixels", 1 pixel with the 2 most significant bits of the pixel to the right, and one pixel with the 2 most significant bits of the left pixel.

one pixel (2x2):

cr
lc

c= currect pixel
r= 2 most significant bits of the right pixel.
c= 2 most significant bits of the left pixel.
.
As for timing and winuae adding the 5th bitplane doesn't seem to change the speed of my effect. SO I would appritiate if someone could check the speed of the source I supplied. with 4 bpl and 5 bpl. (change bplcon0 from $100,5... to $4...) in the copper to change the number of bitplanes.

added on the 2007-02-24 06:42:30 by sp^ctz sp^ctz
Quote:
I found out that when running winuae with use a500 speed and JIT cpu setting switched on produce very inaccurate results. :D (JIT has to be swithed off)


So you are using 68020+ CPU setting in UAE? That's not very A500 like... :) If you select 68000 CPU, you can't enable JIT.

Quote:

SO I would appritiate if someone could check the speed of the source I supplied. with 4 bpl and 5 bpl. (change bplcon0 from $100,5... to $4...) in the copper to change the number of bitplanes.


I will do that today but for that I first have to "fix" your source as there is not even the slightest bit of system kill/restore to be seen in yours. Just banging in your copperlist w/o saving system copperlist and letting the system fully on? WTF! :) Upon exit, I get nice garbage on screen since due to already mentioned reasons. A bit hard to check the speed then. ;) I'll fix it and let you know the speed then.
added on the 2007-02-24 12:00:28 by StingRay StingRay
hehe, ASM-one has system restore built in. And the source is pretty ugly yes :D Hardware banging. he-he.
.
I use Winuae with a1200 setup. Because I'm used to work in a wb3.0 environment.
When testing I switch from AGA to ECS in the chipset menu, and use a500 speed in the cpu menu.
The SMC table renderer now goes 25fps on 320*220 2x2 including the c2p. with 256*256 txture. in my setup. with blitter nasty switched off.
.
Thanks for testing :D
added on the 2007-02-24 12:56:40 by sp^ctz sp^ctz
If you give me your email, I can send the newest source with a table effect. and system restore-..
added on the 2007-02-24 13:02:19 by sp^ctz sp^ctz
yes I know that current asm1 versions have system restore built in. ;) but it's not very helpful on my trustworthy A500 since these newer versions won't work on 1.3. :D i have fixed/adapted the source now so it can be assembled with trash'm-one 2.0 because that's the only assembler I know that supports accessing local labels outside their scope (bsr bla\.bla), yet, I couldn't get trash'm-one to run from disk yet, it crashes for a reason I've yet to find. :D my current source can be found here:
http://stingray.untergrund.net/c2p/c2p2x1_c0b1_scr_000_fixed.s

send your source to: stingray(at)scarab-amiga(dot)com
so I can test it on the real thing too then. :)
It's fun switching that good old A500 on again. :D
added on the 2007-02-24 13:45:55 by StingRay StingRay
ok, update: i gave up trying to get trash'm-one 2.0 to run on my a500 now, so fuckings to deftronic for no proper error messages. ;D instead, i adapted the local label stuff and it can be assembled with trash'm-one 1.6. I now got "workspace memory full" error on my 1mb chip only A500 so I'll have to adapt the source a bit more. ;)
added on the 2007-02-24 15:34:41 by StingRay StingRay
Ok, I can give results now after I added a nifty little "show number of used frames on screen" routine so I just could run the exe on my A500. :)

5bpl version: 135/136 rasterlines
4bpl version: 128/129 rasterlines

blitter nasty was off. and i did test it on my chipmem only a500. ;)
updated source with the rasterline display can still be found at http://stingray.untergrund.net/c2p/c2p2x1_c0b1_scr_000_fixed.s

oh and I of course only timed the actual c2p, not the smctable stuff. =)
added on the 2007-02-24 21:20:05 by StingRay StingRay
"show number of used rasterlines on screen" ;)
added on the 2007-02-24 21:20:57 by StingRay StingRay
greetings to all the guys involved, bring the real democoding back ! yeaaahh
cheers
added on the 2007-02-25 07:39:37 by apricot apricot
StingRay: I mailed you another version of the source now. with the table effect, modulostretching etc... I need to put the blitter passes in a blitter interrupt list. since now only the last pass will multitask with the cpu.
.
btw: nice txtplotter :D
added on the 2007-02-25 11:25:38 by sp^ctz sp^ctz
sp: Got your mail and after removing the fpu code and reducing the memory usage I could finally test it on my A500 and here is what my textplotter wrote. :)

c2p: 46 rasterlines
tab: 1309/1310 rasterlines
tot: 1355/1356 rasterlines

so your effect needs about 4vbls. :) I'll send you a mail. =)
added on the 2007-02-25 15:37:02 by StingRay StingRay
oops. I sendt you the wrong sourcecode. This source was made to generate tables for export. So no SMC here. I sendt a new version to you now, with your txtplotter. Tnx for testing!!
added on the 2007-02-25 17:17:45 by sp^ctz sp^ctz
today I have converted my mc68020+ txturemapper objectviewer. to 68000 code. the polyfiller doesn't fit the 030 cache anymore because I had to convert alot of move.b (a5,d0.w*4),d1 etc.. to lsl.l #2,d0, move.b (a5,d0.l),d1.
I managed to remove most of the divs'es by using a divs table. and now I only have 2 divs'es pr. poligon. I probobly can precalc these two divses too, since my MC68000 cycle table really tell me I should remove them :D

Stupid work really.
added on the 2007-02-25 17:28:12 by sp^ctz sp^ctz
Quote:
oops. I sendt you the wrong sourcecode. This source was made to generate tables for export. So no SMC here. I sendt a new version to you now, with your txtplotter. Tnx for testing!!


You're welcome, no worries. :) Good job btw! :)

Quote:
since my MC68000 cycle table really tell me I should remove them :D


hehehe yeah :P go for it. :D tables are your friend :D
added on the 2007-02-25 17:35:44 by StingRay StingRay
Well, I must be doing something wrong - been beating my head out for days. I can't get a suitable assembler to work on my A500 or A500+ (Asm Pro 1.16 and Asm-One 1.48 just fritz out, Asm One 1.20 runs but not enough core to compile fully). I tried assembling under WinUAE (Asm Pro) and tried assembling on my A4K with Asm-One 1.48 and moving the binary across. Everything compiles OK but on a real A500 it just flicks to 'USED RASTERLINES = 000 MAX=256' then sits there. Devpac2 runs OK and compiles and runs a sawn-off Menace source OK so the hardware seems good.

StingRay: is the plot display supposed to be that hard to read?

If either of you guys can help I'd appreciate it, or if you like just email me a binary I'll run it and add another data point to your tests.
added on the 2007-02-25 21:35:52 by jima jima
jima: you did nothing wrong, actually in the source I've uploaded the c2p routine is commented out, sorry for that, I forgot to remove that before uploading. look the the "just for testing ;)" line in the source. Then move 5 lines down and remove the semicolon and you will see something on screen. :)

The text is a bit had to read because of the way the c2p works, there's nothing much I can do about it. It was just a quick hack to easily see the number of used rasterlines. :)

Assembling on the A500 with 1MB didn't work for me either due to lack of memory so I also assembled on my A4k resp. UAE and then copied the .exe to a disk that I booted on the A500. :) Hope that helps you. If you still have problems, just ask. :)
added on the 2007-02-25 22:21:16 by StingRay StingRay
StingRay: thanks for that, duh I should have spotted it, too much time spent focussing on the assemblers. Had a brainwave - dug out one of my A1500's which has KS1.3, 1Mb chip and 2Mb fast and found out it will run Asm-One 1.20 so I can compile and run natively now.

sp: Some tests now done on WinUAE A500 emulation (exact speed, exact blitter timing), A500+ (KS2/1Mb chip) and A1500 (KS1.3/1Mb chip/2Mb fast) as follows:

5 bit-plane

WinUAE BlitterC2P - 135/135
WinUAE smctable - 008/008 (not sure if I have this part of the code right)

A500+ BlitterC2P - 135/136 (then flips to 135/391 after approx 1 second)
A500+ smctable - not yet tested

A1500 BlitterC2P - 135/135 (then flips to 135/391 after approx 1 second)
A1500 smctable - 008/008 (not sure if I have this part of the code right)

4 bit-plane

WinUAE BlitterC2P - 128/128
WinUAE smctable - 007/007 (not sure if I have this part of the code right)

A500+ BlitterC2P - not yet tested
A500+ smctable - not yet tested

A1500 BlitterC2P - 128/129
A1500 smctable - not yet tested

I probably won't bother with any more A500+ tests as the A1500 seems exactly the same and I can compile on it. Happy to carry out futher tests if you care to post the code or email it.

Jim
added on the 2007-02-26 02:44:55 by jima jima
edit:

4 bit-plane

WinUAE BlitterC2P - 128/128
WinUAE smctable - 007/007 (not sure if I have this part of the code right)

A500+ BlitterC2P - not yet tested
A500+ smctable - not yet tested

A1500 BlitterC2P - 128/129
A1500 smctable - 007/007
added on the 2007-02-26 15:44:43 by jima jima
Tnx for testing guys. Next time I will supply an exe file wich takes arguments from the command line. The argument is the height of the chunky buffer. It seems like the 5th bitplane doesn't steal as much DMA as I thought it would.
.
I'm back from the "bush" where I don't have internet. (my gf's farm here in thailand). I've converted my txturemapper to plot in nibble chunky, but it still has some bugs. Here is the 48 byte txturemap poligon filler before changing it so self modified code. I am using a 5 inst. rol innerloop wich is is not optimal on the 000. but it frees 2 registers wich i need in the SMC loop. In the SMC loop I interpolate 1 time for every 8 pixel.
.
I managed to remove many bytes from my mapper I wrote 10 years ago. so now the whole loop (poligon/outer/innerloop) is 230 bytes :D
.
CODEPORN:
.ytre
move.l d5,d0
move.l d5,d7
swap d0
asr.w #8,d0
asr.w #8,d7
sub.w d0,d7
lea (a0,d0.w),a1

move.l a7,d0
.indre
move.l d0,d1
lsr.w #8,d1
rol.l #8,d1
add.l d3,d0
move.b (a6,d1.w),(a1)+
subq.w #1,d7
bpl.b .indre

add.l a4,d5
add.l a5,a7

add.w #512,a0
subq.w #1,d2
bgt.b .ytre
added on the 2007-02-28 12:49:43 by sp^ctz sp^ctz

login