Tiny Intro Toolbox Thread

category: code [glöplog]
Here is what I have used in the past (it's pretty long but allow precise control over each RGB value):

Code:salc ; clear AL mov dx, 0x3c8 out dx, al inc dx P: ; assume CX = 255 mov bl, 5 call PAL mov bl, 2 call PAL mov bl, 3 call PAL loop P

Code: PAL: mov al, cl not al mul bl shr ax, 2 cmp al, 63 jbe clamp ; clamp mov al, 63 clamp: out dx, al ret

This is rougly the equivalent of :
Code:for(i = 0 ; i < 256 ; i++) { output(i * 2.5); //red output(i * 1); //green output(i * 1.5); //blue }

Of course if you want RGB values to be simple muliples of AL and that result never overflow 63 there is simpler ways (just right or left shift AL).

There should be articles about palette on http://www.sizecoding.org/. No good 256b without good palette.
added on the 2016-12-11 15:06:21 by Tigrou Tigrou
In fact, i'm procrastinating heavily with said article ;) Just as a teaser, "Hypnoteye" uses the Subfunction 10h of INT 10h, which allows for very short palette generation code (reuse of "int 10h")

Code:(mov al,0x13) L: add cl,1 int 0x10 mov ax,0x1010 add ch,4 add dh,8 inc bx jnz L

BB Image
added on the 2016-12-11 17:27:06 by HellMood HellMood
@HelloMood : what is the shortest way to check (and jump if needed) if value in ST0 (FPU stack) is greater than zero ?

I know about FTST but it require a bunch of other instructions to make it work (too much actually). I have also tried FCOMI but without success.
added on the 2016-12-11 17:56:08 by Tigrou Tigrou
FCOMI is known to not work with DosBox, i'd check here ->
added on the 2016-12-11 19:56:44 by HellMood HellMood
This jumps when ST0 > 2^-133 and is shorter than ftst / fstsw ax / sahf / jg. It works by treating the more significant half of a float32 as a signed int16:
Code:; needs di=-2 and ax=0 (or some other regs) fst dword[bx+di] cmp word[bx],ax jg Positive
You can compare with any simple float this way: you get the full sign/exponent and 1+7 bits of mantissa.
Code:; needs di=-3 and al=0 fst dword[bx+di] cmp byte[bx],al jg Positive
When using just the most significant byte you can compare with any 2^(2n+1).
added on the 2016-12-14 10:16:45 by rrrola rrrola
Negative floats are flipped, so they need to be tested with ja/jb.
added on the 2016-12-14 10:23:57 by rrrola rrrola
@rrrola : thanks for the trick. I also tried the following : fistp into a 16-bit var then testing using cmp (it's pretty short). It works but there is some impression that produce visual glitches.

Any idea how this palette code works ? (it's from quatro)

Code: push 0xA000 ; Start of VGA video memory pop es ; into ES xor bp,bp ; BP adressing, uses SS, frees DS, no extra segment needed mov al,0x13 ; mode 13h, 320x200 in 256 colors mov dh,0x80 ; high byte of offscreen memory, low byte not important mov ds,dx ; no palette influence (later) when DH = 0x80 inc cx ; align color components / color number / color count palette_loop: int 0x10 ; shared int 10h ! (palette entry , set mode) sub ch,2 ; adjust green value sub dh,4 ; adjust red value dec bx ; next color mov ax,0x1010 ; sub function to change palette loop palette_loop ; adjust blue value & loop

I produce a nice 4 gradients palette.
I look at the docs but couldn't find anything. AFAIK it's contiunously calling int 10h with ah = 0x10 and al = 0x10.
added on the 2016-12-15 22:54:54 by Tigrou Tigrou
Tigrou, that's basically the routine from "Hypnoteye"
mentioned above, but a bit optimized ;)


You can also load a whole palette at once. If you load
your screen as palette you can achieve very very short
interesting effects.

Popshades 15b
added on the 2016-12-15 23:09:39 by HellMood HellMood
Do'h! I remember I saw something about that palette trick somewhere but couldn't remember where exactly. I search the whole sizecoding.org tutorials and forgot about checking this topic.
Thanks for the links btw.
added on the 2016-12-15 23:22:14 by Tigrou Tigrou
folks, how grayscale palette is obtained in Megapole by Baudsurfer, I see the code but can't find it
added on the 2017-10-19 11:24:36 by gorgh gorgh
He is using the 16 gray shades already existing in the standard VGA palette (offset +16)

Critical code before writing to the screen (stosb)

Code: mov al,16 ; normalize with dithering add overlap ah=color/18+16 aad 1 ; dithering normalized and prepare for next frame cwd test di,di ; test for all pixels plotted overrunning vga segment jp o ; preserve zf flag and test if absolute beam position inc ax ; parity even augmenting lighting for odd meta-pixels o:stosb ; write screen pixel & advance absolute beam position
added on the 2017-10-19 11:35:30 by HellMood HellMood
added on the 2017-10-19 11:40:30 by gorgh gorgh
hello again, is it save to assume that the variable declared at the end of the code as
Code:yvar dw ?
will have zero value on the start?
added on the 2017-10-25 17:14:27 by gorgh gorgh
no, although it's almost safe to assume it works in dosbox. i'd suggest to only place vars outside the code, when you don't rely on any defined starting value. but you could reuse initial code as variables, so you would know the starting value ;)
added on the 2017-10-25 18:56:24 by HellMood HellMood
some other thingy...gave me a headache recently. As I tried to squeeze one byte from
Code:mov bx,ax shl bx,2
Code:shld bx,ax,18
I realized that this worked only on DOSBox and my old AMD Sempron, but not on an Intel Core i5 or i7...seems what's written in the x86 manual "...If the count is greater than the operand size, the result in the destination operand is undefined." is true for some cpu's...oh well, the funny obstacles in sizecoding :-p ...just wanted to share this if you try the same...
added on the 2017-11-04 00:02:06 by Kuemmel Kuemmel
from the top of my head,

Code:imul bx,ax,byte 4

should do it in 3 bytes
added on the 2017-11-04 00:12:51 by HellMood HellMood
...it does ! Thanks, didn't pop up in my head. I'll see if there's a speed penalty on this...
added on the 2017-11-04 00:46:46 by Kuemmel Kuemmel
Of course there is a speed penalty, we keep shifting bytes for reasons!
But in case of limited size-intros it´s always either speed or size...size wins in 90% of all cases!

But there´s sth in this case which makes the IMUL the best for both: MOV/SHL takes 5 cycles together, SHLD takes 4 cycles and the IMUL just 3 cycles.

I wonder if my first sentences still make sense, i assumed x86-MUL would execute as slow as on some 8/16-bit machines i coded in the past, but it seems this ain´t the case!
With (I)DIVs 17-41 cycles i guess the SHR (1 cycle) or SHRD (4 cycles) are still to be preferred, though! :D
Hardy, I thought so too, you would want to go for shifts usually. It seems to be also dependent on the CPU architecture.

For my routine there's not much of a difference as the bootleneck is actually elsewhere, but when I look at Agner's Instruction tables here, for example on the Intel Skylake architecture it looks like MOV+SHL have a latency of 1+1, then SHLD has 3 and IMUL 4, but I think you can't rely on that tables in the end and test it anyway as it depends on the instructions before and after also.
added on the 2017-11-04 11:05:07 by Kuemmel Kuemmel
...due to learning that stuff for myself and the bytebeat achievements in the last few years in tiny intros I wrote a tutorial to do Advanced PC Speaker and COVOX sound via interrupt.

This section derived from a talk to TomCat who provided 99% of the code and that should give you a nice start to get your bytebeat into a nice 256 byte intro.

Of course there's lots more to add and talk about bytebeat. Any comments/additions/corrections welcome as always.

Now you don't have any excuse for a "soundless" tiny intro ;-)
added on the 2018-11-15 18:59:49 by Kuemmel Kuemmel
that's right :) Now the wiki covers MIDI, pcspeaker (PWM and normal) and COVOX

nice work dudes :)
added on the 2018-11-16 11:00:18 by HellMood HellMood
...due to learning that stuff for myself and the bytebeat achievements in the last few years in tiny intros I wrote a tutorial to do Advanced PC Speaker and COVOX sound via interrupt.
Oei! That is some nice stuff. Thanks!
added on the 2018-11-16 13:35:31 by numtek numtek
btw: no fcomi in DOSBox.

Supported in DOSBox-X for more than 1 year. And I hope it will be supported in vanila DOSBox at one day. patch

This is the best thread on pouet (at least for me). I'd like to update it with some recent information. So I will respond to some old comments (sry).
Operating system
Windows XP.

Editor / IDE
EditPlus - The only disadvantage is that it's not free.

The Netwide Assembler - NASM, the most handy one.

NDISASM provided with NASM is enough.


Viewing executable in hexadecimal mode is useful, for example for checking if some code parts can be used as constants. HxD is good and free.

My way:

Operating system
Native DOS from USB, created by Rufus.
DPMI extension: HX DPMI v2.17
mouse driver: CuteMouse v2.1

Editor / IDE
FASMIDE - Comes with FASM assembler for DOS.

DEBUG clone v1.32b - redirecting text output to a file :-)

CodeView v2.2 by MS

HIEW v6.50 DOS - examine code and search for long instructions
Code: ; st0 st1 on fpu stack - leaves the maximum in st0 _max: fcomi st0, st1 fstsw ax jbe _max0 _max1: fxch st1 _max0: fstp st0 ret

Any suggestions for a smaller version?

Code:FCMOVB ST(0),ST(1) FSTP ST(1)
fast and 4 bytes only and yes, it's PPro