Tiny Intro Toolbox Thread

category: code [glöplog]
Your jokes are bad and you should feel bad. ;)
added on the 2012-06-06 02:18:11 by rrrola rrrola
Oh fuck! With the pseudo-code syntax I didn't realize how short rrrola's 0xCCCD trick was in ASM. REALLY NEAT!
added on the 2012-06-08 13:58:37 by p01 p01
Is this any good? DropBox Turbo for Android platform

I really didn't bother to test it my self ;)
added on the 2012-06-08 19:11:22 by maytz maytz
kkrunchy source code
kkrunchy_k7 source code

both include older but reasonably size-optimized versions of disfilter; the newer version described in my blog post generally works better, but it was done after i was actively working on kkrunchy.
added on the 2012-06-09 06:30:58 by ryg ryg
oops, wrong thread :) sorry about that.
added on the 2012-06-09 06:31:28 by ryg ryg
...and of course I meant DosBox Turbo, not DropBox. ...silly me, *GG*, *SS*, fnis tihi og Anders du lukker bare røven!
added on the 2012-06-09 09:25:34 by maytz maytz
Small sine approximations (19 bytes Taylor for 2*Pi and 9 bytes parabola for Pi/2)
because smallest homogenized sine from FPU must be around 30 bytes

Code: ; 19 bytes sine table approx from 0 to 2*Pi : 255 values amplitude=53 [-26;0;+26] _sin: mov bx,ax ; al=bl=x imul bl ; ax=x*x mov al,ah ; ax=x*x*256+x%256 imul bl ; ax=x*(x*x*256+x%256) mov al,ah ; ax=(x*(x*x*256+x%256))*256+(x*(x*x*256+x%256))%256 shr bx,2 ; bx=x/4 add al,13 ; ax=13+(x*(x*x*256+x%256))*256+(x*(x*x*256+x%256))%256 sub ax,bx ; ax=-x/4+100+(x*(x*x*256+x%256))*256+(x*(x*x*256+x%256))%256 db 0d4h,64 ; ax=(-x/4+100+(x*(x*x*256+x%256))*256+(x*(x*x*256+x%256))%256)%64 ret

Code: ;9 bytes parabola _sin: ; sin(x)=(x*4/pi)-x*x*(4/pi*pi) if x>0 from from http://www.coranac.com/2009/07/sines/ in deg : sin(x)=(10*x-x*x)/6000*scale ; sin(x)=(x*4/pi)+x*x*(4/pi*pi) if x=<0 mov bx,ax ; al=bl=x mul al ; ax=x*x xchg ax,bx ; ax=x bx=x*x aad ; ax=10*x (2 bytes smaller than mov dl,10 + mul dl) sub ax,bx ; ax=10*x-x*x if x>0 else branch/replace by add ax,bx or use absolute value trick mov al,ah ret ; ah=6000*scale, using scale=23,4375 ah=256*al

Download pics and sourcecode
Corrected link to output pics : view.

There is also an old 40 bytes fixed point sinus generator gem by KarL/NoooN but it is bigger than fpu version I believe.

Code: in al,60h das jp loop

is 1 byte shorter than :

Code: in al,60h dec al jnz loop

although it will exit on all keypresses and not just the <esc> key.
Abductee uses this.
absolute JP vs conditional branch? hmmm.
added on the 2014-10-15 10:43:36 by g0blinish g0blinish
JP = "Jump if parity"
added on the 2014-10-15 16:34:32 by bartman bartman

Code:in al,60h das jp loop

exits directly when startet out of the console at XP (same for abductee).

It is sad when microcode for a mnemonic behaves differently from one x86 cpu to another. This code works as expected here either on ntvdm or DosBox-0.74 on Intel(R) Core(TM)2 Duo :

Code: org 100h esc:in al,60h dec al jnz esc ret


"Description: This instruction incorrectly documented in Intel's materials. See description field. [src : http://asm.inightmare.org/opcodelst/index.php?op=AAA]"

I would assume this applies to aaa, aas, das and daa microcode interpretation as well.

A wild guess would be that the microcode was indeed wrongfully referenced by Intel at the beginning and emulators started implementing that theoretical wrongful interpretation without testing it, quickly followed by a correction of the chip makers themselves. If that was the case it could explain the behaviour difference between old and new x86 cpus.

But of course this is only pure speculation (unlike Intel's wrong initial documentation which is proven).
In code above, I pasted wrong snippet.

This is the code tested that works here physically and is one byte less than dec al/jnz (other 1 byte BCD opcode combinations exist) - although it does not work in "emulated" DosBox-0.74 :


; works physically for Intel(R) Core(TM)2 Duo
org 100h
esc:in al,60h
jz esc
Guys, you shouldn't assume anything about sign, zero, parity or overflow after aaa/aas/daa/das (so no jp/jz/js). They only predictably affect carry and auxcarry (and their operation depends on auxcarry, which "in al,60h" doesn't affect).

DAA: if (A0 > 9) AF = 1; if (AF) AL += 0x06; if (A1 > 9) CF = 1; if (CF) AL += 0x60;
DAS: if (A0 > 9) AF = 1; if (AF) AL -= 0x06; if (A1 > 9) CF = 1; if (CF) AL += 0x60;
AAA: if (A0 > 9) AF = 1; CF = AF; if (CF) { A0 += 0x06, AH++; }
AAS: if (A0 > 9) AF = 1; CF = AF; if (CF) { A0 -= 0x06, AH--; }
where A0 is AL&0x0F and A1 is AL&0xF0.
added on the 2014-10-16 13:13:52 by rrrola rrrola
In Intel Instruction Set Reference for DAA, DAS all flags except OF are actually defined.
added on the 2014-10-16 15:43:28 by frag frag
In that case, please carry on.
added on the 2014-10-16 18:14:59 by rrrola rrrola


in al,60h
jp loop

is 1 byte shorter than :


in al,60h
dec al
jnz loop

although it will exit on all keypresses and not just the <esc> key.
Abductee uses this.


In Intel Instruction Set Reference for DAA, DAS all flags except OF are actually defined.
added on the 2014-10-16 15:43:28 by frag

Frag And Řrřola : it is a relief to learn first-hand I am not totally crazy ;)
Started as a mail to Optimus, but i thought sharing this could help others


Just look at the code of "Hypnoteye"
The tricks are :

1.) Horizontal segment shift (rrrola)
Adding 1 to the segment, adds 16 to the horizontal pixel offset. So, by adding n * 10 + 20
you center the image horizontally.

2.) The 0xCCCD trick (rrrola)
Multiplying DI (the current pixel position) with 0xCCCD, will put their respective X and Y
coordinates into DX and DL, AH.

For example (5,5) = (5*320+5) = (1605) = (0x645)
0x645 * 0xCCCD = 0x05040141

X 05.04
Y 04.01

Note : these values are now between -32768 and 32768 ;)

3.) Centering vertically
By a funny coincidence, if you start your code with "Push <word>" there is the opcode
0x68 at the start, which is [SI], if you don't touch that register. That's about the half of
the screen height, so you can subtract that from the Y coordinate, which safes a byte
over "sub dh,100" ;) For finding such coincidences, this map is very helpful

4.) Stack Adressing (rrrola and frag)
If you keep the register BX at zero, or make it be zero at the time you use this trick
you can efficiently get you registers on the FPU. That's possible because the stack
pointer SP is almost always the same on startup http://www.fysnet.net/yourhelp.htm
Just do a "pusha/popa" instead of several pushs/pops and adress via [BX-??]
See the stack order here : http://x86.renejeschke.de/html/file_module_x86_id_270.html
You also get easy access to "<byte> << 8" and "<byte> >> 8" terms, without the need
to actually perform something like "mov al,ah" or "shr ax,8" which saves (1-2) bytes.

5.) Constant Overlap
Instead of using 4 bytes precision for floats, use 2! append your constant right after
your code and leave the lower 2 bytes out. Calculate you hexadecimal representation
of the float HERE : http://www.h-schmidt.net/FloatConverter/IEEE754.html
That's almost always precise enough.
Advanced : Find places in your code, where the first two bytes of the constant already
exist, and refer to that places. Saves 4 bytes if done right. I pulled this of once in
Hypnoteye ;) (label mm) Again, to know where your desired numbers - or
good estimations of them - are use this map
Protip : Sometimes switching the sign of the constant helps ;)

6.) Black Color
You might know this one. If you want a black pixel, do a SALC. No matter what the flag
states are, AL is 0x00 or 0xFF afterwards, which both map to black when you use the
default palette. One byte is one byte! ^^

7.) Dithering to make smooth animations
This one is funky. Since "stosb" doesn't trigger any flags on overflow
(see here : http://x86.renejeschke.de/html/file_module_x86_id_306.html)
you need to add anything by yourself anyway. I found that two "inc di" work well
enough. It bascically draws every third pixel and then repeats this 2 times on
a shifted offset. After three frames, the zeroflag finally triggers and increments
the REAL framecounter by one. 2 bytes for triple diagonal interlacing, so to speak ^^

8.) Alternative palette generation
Something everbody should look at : the interrupt 10h version of setting colors
via OUT (0x3C8,0x3c9). Since it uses the interrupt 10h, you can save two bytes
here and weave the color generation into setting the video mode.
See : http://webpages.charter.net/danrollins/techhelp/0144.HTM

That's it for once. You should be able to integrate your raytracing loop into the
space between "pusha" and "popa". Loading the coordinates is already there,
(remember, between -32768 and +32768). Storing it back onto the stack, and
then getting it, is also there (to AL and CH, see, i already used an implicit shift
by 8 there). I will copy this on into the tiny tool box thread also, for everyone ;)

Greets, HellMood

Code:org 100h push 0xa000 - 70 ; rrrolas trick I pop es ; (advanced) mm: or al,0x13 ; 320x200 in 256 colors cwd L: ; custom palette add cl,1 int 0x10 mov ax,0x1010 add ch,4 add dh,8 inc bx jnz L X: mov ax,0xcccd ; rrrolas trick II mul di sub dh,byte [si] pusha fninit fild word [byte bx-8] ;<- dh+dl ; x fst st1 ; x x fmul st0 ; x*x x fild word [byte bx-9] ;<- dl+bh ; y x*x x fst st3 ; y x*x x y fmul st0 ; y*y x*x x y faddp st1 ; y*y+x*x x y fsqrt ; r x y fidivr dword [byte si+mm-3-0x100] ; invr x y fistp word [byte bx-5] ;-> al+ch ; x y fpatan ; arc fmul dword [byte si+num-2-0x100] ; arc/pi*256 fistp word [byte bx-8] ;-> arc -> dx ; popa add dx,bp ; time influence shr dx,2 test al,0x80 ; inner bound jnz colskip cmp al,0x1e ; outer bound jg F2 salc jmp short colskip F2: add ax,bp ; time influence xor ax,dx ; arc influence colskip: stosb ; write inc di ; dither inc di jnz X inc bp ; next frame ... in al,0x60 ; check for ESC dec al jnz X ret ; exit num: db 0xa2,0x42
added on the 2015-09-14 20:16:51 by HellMood HellMood
Phoenix/Hornet reminded of this thread :)

Some specific case creative shorter loops on x86 (maybe could come handy)

;5 bytes :
mov cx,2
twice:... /w non-CF
loop uio

;4 bytes :
mov cl,2
twice:... /w non-CF
loop uio

;3 bytes :
twice:... /w non-CF
jc uio

Also for 3-times loop one can also inc accumulator and use PF with jnp for example
00000001b PF=0 first run
00000010b PF=0 second run
00000011b PF=1 third run
Typo above : the 'uio' label refers ofc to the 'twice' label
Code:mov al,13h int 10h push 0a000h pop es ;pop ds ;ds (backbuffer) contains randomness mov ds, [0] fnstcw [w] or dword [w], 0xc00 ;truncate fldcw [w] ;32160 AGAIN: xor di,di xor si,si mov byte [ds:32160], 15 ;plot in center fld dword [a] fsincos fmul dword [p] fstp dword [c] fstp dword [s] ; xor si,si ; xor di,di mov word [y],199 Y mov ax,width mul word [y] mov word [yaddr],ax ;yaddr = y*width mov word bx,[y] sub bx,100 mov word [y_],bx mov word [x],319 X mov word bx,[x] sub bx,160 mov word [x_],bx fld dword [c] fimul word [x_] fld dword [s] fimul word [y_] fsubp st1,st0 fistp word [u] fld dword [c] fimul word [y_] fld dword [s] fimul word [x_] fsubp st1,st0 fistp word [v] mov dx,r add dx,3 xor bx,dx rol bx,5 xchg dx,bx mov [r],word dx mov cl,dh and dx,1 and cl,1 add dl,cl dec dx add word [u],160 add word [u],dx add dh,bl and dh,1 add dh,cl dec dh add byte [v],100 add byte [v],dh xor ax,ax mov al,[v] mul word [width] mov [vaddr],ax ;vaddr=v*width ;buffer[x+yaddr]=backbuffer[u+v*width] ;lodsb ;al=ds:si ;stosb ;es:di=al mov si,[x] add si,[yaddr] mov al,[ds:si] mov di,[u] add di,[vaddr] mov [es:di],al dec word [x] jns X dec word [y] jns Y ;buffer = es:di ;backbuffer = ds:si ;for i=0 to size backbuffer[i]=buffer[i] xor si,si xor di,di mov cx,64000 L mov al,[es:di] mov [ds:si], al inc si inc di loop L jmp AGAIN ret ret w: dw 0 a: dw 0 ;angle p: dw 0 ;amplitude c: dd 0.0 ;cos s: dd 0.0 ;sin r: dw 0 ;rand x: dw 0 y: dw 0 x_: dw 0 y_: dw 0 u: dw 0 v: db 0 width: dw 320 yaddr: dw 0 vaddr: dw 0

unfinished and borked fractal rotozoomer. if anyone fixes/finishes it give me a peep. creds are due. im to lazy to finish it..
added on the 2016-01-16 22:37:57 by rudi rudi
Things that I would love to have when coding small intros (but that does exists) :

- FPU instruction for loading a float from a 16-bit value to the stack (Half float and minifloat are cool!)
- CALL rel8 (similar to short jumps)
added on the 2016-12-11 01:29:30 by Tigrou Tigrou
to create your custom palette the way (almost?) any source tells:

Code: palette: mov dx,3c8h ;index port out dx,al ;al = #of your color inc dx ;data port is at 3ch9h [calculate al] out dx,al ;red 0..63 [calculate al] out dx,al ;green 0..63 [calculate al] out dx,al ;blue 0..63 [loop stuff]

but if you want to change all 256 colors you can skip

Code: mov dx,3c8h out dx,al ;al = #of your color inc dx

and use those bytes for your effect. just access the data port and set your 256 colors:

Code: mov dx,3ch9h palette: [calculate al] out dx,al ;red 0..63 [calculate al] out dx,al ;green 0..63 [calculate al] out dx,al ;blue 0..63 [loop stuff]
ah the typo: 3c9h and not 3ch9h :D