pouët.net

Amiga 68060 no FPU, demo running success?

category: general [glöplog]
 
Just got a TF1260 68060 from ebay, it seemed cheap like 225£ so my impulsive buying without thinking it too much. Previously I had 68030 with FPU.

First problems, I need to replace kickstart roms from 3.0 to 3.1. FastRam is zero, SysInfo shows speed almost like unexpanded A1200, but shows 68040. I install MMUlibs, it shows 68060 but still slow. I found out how to use AddMem, somehow I manage to make the fast ram appear. Initially there wasn't enough help of what start/end address to use, but I plugged back my 68030 and show it had 0c000000 to 0fffffff and then used the same and voila, it worked! Now also SysInfo shows the speed being double of 68040 at 25mhz. Is 68060 similar to 68040 with extra instructions or should it be way faster?

Problem two, many modern demos (even older demos I could run on 68030) need FPU. I install SoftIEEE lib. Are there are other faster? If I understand, a demo that uses FPU opcodes, will trigger illegal instruction, which will be intercepted by the lib and do it's own thing, so even that interrupt trigger after error will make things exceedingly slow. Or are there better solutions?

Anyway, my success might be 10-20%. There are few later demos that I know are slow like slideshow on 030 but managed to run smoothly. There are others with long precalcs at startup, I mean much longer than even with the FPU in my 68030 board, unfortunately. And when I waited 10-15 minutes for some of these demos to load, there were glitches or frozen part or reset, meaning even later some demos will do few FPU stuff between the parts, not just at precalc.

So, am I thinking correctly, 68060 without FPU is mostly useless for most demos? Or are there better solutions, different libs (or maybe the 3.0 ROMs don't help, but I ordered 3.1 ROMs to install soon). And why FPU is so expensive? The same board I bought is at 299 euros in a store without FPU. With FPU is like 600-700 euros iirc.

p.s. Is the PiStorm solution better? Or is too fake or incompatible? (Like I see in a youtube video, SysInfo does numbers no normal 68060 would ever do)
added on the 2023-05-23 11:35:12 by Optimus Optimus
Quote:
So, am I thinking correctly, 68060 without FPU is mostly useless for most demos?

Indeed, for demos targetting the 68060 you really do want a full 68060 with FPU
added on the 2023-05-23 11:54:19 by britelite britelite
Quote:
And why FPU is so expensive?

Supply and demand ...
Quote:
Is the PiStorm solution better?

Depends on your definition of better. Going a bit farther in that direction, is WinUAE better?
added on the 2023-05-23 12:39:59 by absence absence
Try my AGA stuff, Optimus. It runs without FPU. For instance, Electromathesis, Antelogium or Sensorium Hyperaesthesia.
added on the 2023-05-23 14:12:38 by ham ham
there is a software emu for fpu that I tested successfully on FPGA based Amiga (MIST).
It helped me to launch some FPU demos and Lightwave.

If your 060 starts performing fast then that solution can at least turn most demos into runnable.

I don't remember the URL, Amiga community on discord for sure will help you find it. I think it was somewhere on Aminet.

Good luck!
added on the 2023-05-23 19:03:02 by hollowone hollowone
I read that some of dodke‘s demos like Origins/Unique, Pt 2 Horizons, et al. work quite well with the TF1260.

Actually, I was thinking to get a TF1260 myself and code some fixedpoint stuff.
060EC/LC without FPU in my eyes deserves more support and could be a viable higher end amiga demo platform that doesn‘t cost a fortune and is available to actually buy.
added on the 2023-05-23 20:41:18 by arm1n arm1n
How limited is the Amiga from it's chip-ram really? Just curious,. even think I am planning at some point to try Kalms C2P routines on both my 030 and 060.

I recently bought the PiStom32. Ok,. in SysInfo it does close to 800. The 060 at 60mhz would do close to 40. That's quite fast, it's like an 68060 at 1.2GHZ in theory. But there are few problems, some demos will display horrible bugs and crash, which maybe I need to slow it down somewhere. I haven't found a tool or information how to set up the PiStorm32 to be roughly at the speed of a standard accelerator.

The other curiosity is, I still notice certain demos don't feel smooth 1VBL. The speeds are good, but sometimes it's choppy or maybe near the 1-2VBL (vsync hick ups). It could be timing of the movement, or really hitting some chip-ram bottleneck.

In the demo Hottstyle Takeover, there is a comment from an article from a diskmag, claiming you can't do full-frame chunky effects even on 68060 as the chip-ram dictates the speed. Of course I believe you can do certain things, just with tricks and clever techniques or not rendering all bitplanes or all areas of the screen at the same time, but is there any truth in that statement? So if you did something in full 320*256*8bpl, would the fastest C2P on it's own struggle to get 1VBL? Or it would be above but not give enough time for also rendering something good in the backbuffer?
added on the 2023-06-19 11:14:05 by Optimus Optimus
If I remember correct just copy 64k of memory (fast mem -> chip mem) for a 320x200 buffer is takes about 15 ms which doesn't give lots of room for rendering (5 ms left)

What I do in Rift to try to keep things in cache as much as possible by doing tiled rendering and then do C2P of a tile. Half of the writes goes to chip mem and the other half goes to fast mem. Then during next frame when I do rotating / calculations I do the remaining writes to chipmem as these can happen in the "background" while other calculations happens (as long as you don't touch the buss)

I also do other things such as interleaving FPU calculations (such as rotations) during the C2P to have the FPU doing heavy operations (such as FDIVs) etc while the C2P is on going from a tile.

Writing fast stuff for the A1260 (when you need to write to chip-mem) is for sure limited by it's speed, but with careful (and somewhat annoying) setup it's possible to get more speed of it, but one always has to keep it in mind.
added on the 2023-06-19 12:35:49 by emoon emoon
Sounds like the TBL engine is pretty extreme by now :D
added on the 2023-06-20 16:29:49 by rloaderro rloaderro
loaderror: :) I did write a new "engine" for Rift and the way I describe above is a PITA way to write it, but it does improve performance.

If I ever go back to making A1260 stuff again I have some more ideas on how to improve it more, but we will see when that day comes :)
added on the 2023-06-20 17:25:02 by emoon emoon
Interesting, thanks
added on the 2023-06-21 07:27:41 by Optimus Optimus
np!

So the TL;DR here is that you really need to care about the writes to chipmem. You can for sure do full frame-rate effects on A1260, but you need to think about those writes.

The key here is to do the writes while you can do other calculations (this is how C2P on 060 is copy speed as you can convert while writing to chipmem)

If you are writing 4 bitplanes then then C2P and writes to chipmem will overlap without stalls (according to my old tests at least) but as soon as you have more bitplanes you will end up waiting for writes to complete and waste cycles and that is the reason why I write half of my data to chipmem and then write the other half later where I have some more calculations going on.

The way I think about performance for 060 is (in matter of importance)

1. chip writes
2. cache misses (from fast)
3. instruction pairing.

Of course, there are always cases when instruction pair may be more important than cache misses depending on your code, but the guide above I have found to be a good outline at least.

The easiest way to reduce chipwrites is of course to lower the amount of bitplanes you have and/or reduce resolution. Otherwise you will have to interleave them in a smart way, which is what I would recommend. It make it more painful to write the code, but also more satisfying :)
added on the 2023-06-21 09:07:27 by emoon emoon

login