pouët.net

Tiny Intro Toolbox Thread

category: code [glöplog]
I was imagining things. :) The one-write version is not any faster.

Is there something that can be assumed in general about which registers are preserved by DOS calls?
added on the 2019-04-17 12:34:04 by Blueberry Blueberry
@Blueberry: you could look here and here, not sure it really preserves the things which are *not* mentioned in the "return values" list
added on the 2019-04-17 12:45:05 by HellMood HellMood
On a real dos you can just trace into the interrupts and look if the service routines preserve explicitly (pusha, pushf, push *s), never tried that on a dosbox though ^^
added on the 2019-04-17 12:46:34 by HellMood HellMood
I was curious because evidently the WRITE call (int 21h, AH=40h) preserves BX. But this doesn't seem to be mentioned in any of those docs.
added on the 2019-04-17 13:04:25 by Blueberry Blueberry
@Blueberry: i debugged into int21h for AH = 3Ch (virtual box, msdos 6.22, debug.exe, "t" & "p") and from what i saw, it pushed/popped everything it changes inside, besides the return values. Of course i can't cover every case but there is also this : "If calls are made by the approved method the contents of all registers are preserved through calls". That's also what one would assume intuitively, right?
added on the 2019-04-17 19:34:13 by HellMood HellMood
Also, german source : "Generell gilt, dass die Registerinhalte vor dem Aufruf von INT21h-Funktionen nicht gesichert werden müssen, denn sie bleiben erhalten. Das gilt natürlich nicht für die Register, in denen die betreffende INT21h-Funktion Werte zurückgibt."
added on the 2019-04-17 19:38:26 by HellMood HellMood
You can't rely on this behavior because different versions of DOS can do slightly different things. Don't assume what you see in DOS 6.22 works in DOSBox, or FreeDOS, or DOS 3.3, or MS-DOS 7.x (ie. what comes with Win9x).

Most DOS calls are supposed to preserve all regs that don't return results, but many DOSes trash BP. Ralf Brown's Interrupt List has a few known gotchas.
added on the 2019-04-22 17:09:02 by trixter trixter
Do you want High Resolution and PC speaker in 32 bytes?
> here you go <
added on the 2019-04-26 10:56:33 by HellMood HellMood
When tinkering with "secret modes" i found the not-so-secret mode 0x6A which i give too little attention since it didnt work in dosbox. Apparently this allows 800x600 resolution with no overhead, applied here to "dragon fade"
added on the 2019-05-01 00:51:55 by HellMood HellMood
It might be overlooked, but if you are interested in coding your 256 Byter with the help of SSE (didn't see much of that yet for tiny intros...) to get some serious speedup over an FPU version you can check out Frigo's lovely Kali-set and my SSE (level 4.1 needed) implementation in that product thread. Thanks also to TomCat for additional help.

The speedup of the double pixel interleaved loop is > 400% while keeping the size, so it actually makes sense for that kind of algorithm. For others may be the overhead could be too much.

What keeps me puzzled is how to optimize the conversion of an xmm register where you got your RGB values as single floats to an RGB dword integer. Especially when there's a chance that you got some floats that are too big for a DWORD I needed to do an extra minps (plus xmm2 setup for the 255.0 mask, what also costs another may be 10 Bytes) to overcome artefacts:
Code:;xmm1 = R|G|B|... minps xmm1,xmm2 ;clamp to a maximum of 255.0 (xmm2=255.0|255.0|255.0|255.0 cvtps2dq xmm1,xmm1 ;int conversion packuswb xmm1,xmm1 ; packuswb xmm1,xmm1 ;dword to byte => 2 times needed movd eax,xmm1 stosd ;plot pixel

Any other idea to do this shorter ? May be it's even better to do that in non-SSE code as it's not speed critical anymore...
added on the 2019-09-27 21:18:42 by Kuemmel Kuemmel
Seems after a while of digging I solved it by myself. So the following sequence seems to do the correct conversion with no artefacts, avoiding the minps and the mask setup. Using first packssdw does the trick :-)
Code:;xmm1 = R|G|B|... cvtps2dq xmm1,xmm1 ;int conversion packssdw xmm1,xmm1 packuswb xmm1,xmm1 ;dword to byte movd eax,xmm1 stosd ;plot pixel
added on the 2019-09-29 10:09:15 by Kuemmel Kuemmel
I can confirm this. I wrote a test for this: DUMP6.ASM

Here is the result:
Code:0.5 CVTT/PACKUSWB/USWB:0 CVT/PACKUSWB/USWB:0 CVT/PACKSSDW/USWB:0 1.75 CVTT/PACKUSWB/USWB:1 CVT/PACKUSWB/USWB:2 CVT/PACKSSDW/USWB:2 34.0 CVTT/PACKUSWB/USWB:34 CVT/PACKUSWB/USWB:34 CVT/PACKSSDW/USWB:34 255.5500+ CVTT/PACKUSWB/USWB:255 CVT/PACKUSWB/USWB:255 CVT/PACKSSDW/USWB:255 256.0 CVTT/PACKUSWB/USWB:255 CVT/PACKUSWB/USWB:255 CVT/PACKSSDW/USWB:255 256.1399+ CVTT/PACKUSWB/USWB:255 CVT/PACKUSWB/USWB:255 CVT/PACKSSDW/USWB:255 65536.0 CVTT/PACKUSWB/USWB:255 CVT/PACKUSWB/USWB:255 CVT/PACKSSDW/USWB:255 16777216.0 CVTT/PACKUSWB/USWB:0 CVT/PACKUSWB/USWB:0 CVT/PACKSSDW/USWB:255 -1.0 CVTT/PACKUSWB/USWB:0 CVT/PACKUSWB/USWB:0 CVT/PACKSSDW/USWB:0 -257.0 CVTT/PACKUSWB/USWB:0 CVT/PACKUSWB/USWB:0 CVT/PACKSSDW/USWB:0 -0.01 CVTT/PACKUSWB/USWB:0 CVT/PACKUSWB/USWB:0 CVT/PACKSSDW/USWB:0 -65536.0 CVTT/PACKUSWB/USWB:0 CVT/PACKUSWB/USWB:0 CVT/PACKSSDW/USWB:0 -16777216.0 CVTT/PACKUSWB/USWB:0 CVT/PACKUSWB/USWB:0 CVT/PACKSSDW/USWB:0
I put together some information which is useful if you wanna do some sizecoding under DOS
https://www.abaddon.hu/usbdos/

login