pouët.net

Fractals all the way down by CRTC

           ______              _        _             _ _
          |  ____|            | |      | |           | | |
          | |__ _ __ __ _  ___| |_ __ _| |___    __ _| | |
          |  __| '__/ _` |/ __| __/ _` | / __|  / _` | | |
          | |  | | | (_| | (__| || (_| | \__ \ | (_| | | |
          |_|  |_|  \__,_|\___|\__\__,_|_|___/  \__,_|_|_|
  _   _                                      _
 | | | |                                    | |
 | |_| |__   ___  __      ____ _ _   _    __| | _____      ___ __
 | __| '_ \ / _ \ \ \ /\ / / _` | | | |  / _` |/ _ \ \ /\ / / '_ \
 | |_| | | |  __/  \ V  V / (_| | |_| | | (_| | (_) \ V  V /| | | |
  \__|_| |_|\___|   \_/\_/ \__,_|\__, |  \__,_|\___/ \_/\_/ |_| |_|
                                  __/ |
                                 |___/

-----------------------------------------------------------------------------

   An FPGA demo by doz from crtc, music by mr_lou, font by tunk.

       Presented at Sundown 2013, 2nd place in Wild Compo.

-----------------------------------------------------------------------------

Info
----

The demo was running on my custom FPGA board, which I designed in order
to create an Amstrad CPC emulator. It's not yet available to anyone else
so there's not really any point releasing a binary version, so I'm making
the source VHDL available and the script to create the data file.

More information on the emulator board at http://cpc2013.com and there's a
development blog at http://cpcfpga.com

The source is (c) 2013 Ranulf Doswell

If you have an questions, please e-mail doz@ranulf.net

-----------------------------------------------------------------------------

Included Bitstream
------------------

If you have an XC3S400 and want to try the bit file, the pin assignments are:

P128	48 MHz clock

P77	Composite sync
P53	V sync
P55	H sync
P77	Red 1
P73	Red 0
P78	Green 1
P74	Green 0
P79	Blue 1
P76	Blue 0
P80	Audio left
P82	Audio right

P130	Data in clock
P129	Data in
P125	Data in allowed

P24,P23,P21,P20,P18,P17,P15,P14,P4,P2,P35,P1,P13,P5,P12,P8,P11,P7,P10
	SRAM address lines
P25,P26,P27,P28,P30,P31,P32,P33
	SRAM data lines
P36	SRAM OE
P6	SRAM WE

The address lines can be in any order, although P10 is the high bit and always
0 from this demo. The required RAM size is 256KB.

Data is fed in byte wise using an SPI compatible interface. P125 is high when
there is room in the ring buffer for more data.

The video is output at NTSC format and all of CSYNC, HSYNC and VSYNC are
generated. The video output is nominally 2 bit although internally it's 4-bit
and dithered down to 2-bit.

The audio is output as a 16 MHz 1-bit stream per channel. You just need a low
pass filter to get good audio from this.

-----------------------------------------------------------------------------

 VHDL source
 -----------

If you want to run this on a generic board, I've included the main file
fractmain.vhd, specified as:

entity fractmain is port(
	clk16				: in	std_logic;
	clk96				: in	std_logic;

	sram_address			: out	std_logic_vector(18 downto 0);
	sram_data			: inout std_logic_vector(7 downto 0);
	sram_we				: out	std_logic;
	sram_oe				: out	std_logic;

	red, green, blue		: out	std_logic_vector(3 downto 0);
	hsync, vsync			: out	std_logic;

	din_nreset			: in    std_logic;
	din				: in	std_logic_vector(7 downto 0);
	din_latch			: in	std_logic;
	din_can_accept			: out	std_logic;

	audio_left			: out	std_logic;
	audio_right			: out	std_logic);
end fractmain;

If you've read the previous section, the above should all be very
self-explanatory, although note that the data input is byte-wise rather than
SPI and the video output is undithered.

If you need composite sync, it's just hsync nor vsync.

-----------------------------------------------------------------------------

Format of the data
------------------

Internally, there's a command address bus and 16-bit command data bus. Most
of the addresses are actually only 8-bit wide, however.

The addresses are mapped as follows:

0000-03FF	Transformation registers (128 * 8 words)
			offset 0	IFS a
			offset 1	IFS b
			offset 2	IFS e
			offset 3	(unused)
			offset 4	IFS c
			offset 5	IFS d
			offset 6	IFS f
			offset 7	(unused)

0400-07FF	Control registers
		0400	Probability RAM write address
		0401	Probability RAM write data
		0402	Probability RAM write length
		0403	Delay until vsync
		0404	Draw intensity (0-31)
		0408	Tone A pitch
		0409	Tone B pitch
		040A	Tone C pitch
		040B	Noise control
		040C	Tone A amplitude
		040D	Tone B amplitude
		040E	Tone C amplitude
		040F	PAL/NTSC select (1=PAL, 0=NTSC)

0800-0FFF	Probability RAM data

1000-17FF	Text RAM

-----------------------------------------------------------------------------

Theory of operation
-------------------

The master clock is 96MHz, although this is divided by 6 everywhere to form
a 16MHz clock which drives most operation because this is the maximum operating
frequency of the SRAM chip.

All SRAM access occurs in pairs - one cycle to read the current value, another
to write the modified value, which ties in neatly with an 8MHz pixel clock.
512 pixel clocks create an entire line, causing a horizontal sync at 31.25kHz
which is correct for both PAL and NTSC. The horizontal syncs are counted to
the appropriate amount to generate the desired PAL or NTSC signal.

For the visible portion of the screen (defined as 46*8 by 26*8 pixels for NTSC
or 46*8 by 32*8 pixels for PAL) all memory bandwidth is used for the video
display and fade. Over all, this gives us a draw rate of:
	50 * (312*512 - 46*8*32*8) = 3.2768 Mpixels/s for PAL
and	60 * (262*512 − 46×8*26*8) = 3.4560 Mpixels/s for NTSC

The the blanking periods, the video signal is always black and the memory
bandwidth is used to draw pixels (read, increment intensity, write). Due to
this, we need to generate a new IFS position at 16MHz to obtain maximum
bandwidth.

There are 6 operations in an IFS transform, which maps nicely to our 96MHz
input clock. Those transformations are:
	x' = ax+by+e
	y' = cx+dy+f
They have been additionally transformed to include to projection into screen
co-ordinates and back.

The transformation RAM output is wired directly to one of the inputs of an
18x18->36 multiplier and the lower order address bits are cycled through
000, 001, 010, 100, 101, 110 such that the multipler input cycles through
a, b, c, d, e, and f before repeating.  The other input is updated each cycle
to be x, y, or 1.0 (converted into fixed point). This whole process is
pipelined so it can execute at 96MHz.

The higher order address bits of the transformation RAM come from the output
of the probability RAM. The lower 8-bits of probability RAM are provided by
two LFSRs (63 bits and 47 bits long, providing a cycle length of 2^110 bits).
The LFSRs are shifted at the 96MHz clock so that no bits are in common with
the previous usage when latched at 16MHz.

The higher order 3-bits of the probability RAM are used to support having
multiple IFS images on the screen at once. By having 8 different sets of
probability data, you can have 8 distinct fractals rendered on top of each
other. Alternatively, you can use the same probability data repeated to
provide extra rendering time to a large fractal or provide increased
probability resolution.

The high order bits are cycled every line in the visible area (because it can
do so without penalty) or every 8 lines in the non-visible area. It takes a
few iterations for a fractal to converge after changing, but when the SRAM is
in constant use for video during the visible area, we can do this every line.

The probability RAM data is also used for provide the colour for the pixel,
the lower 3 bits providing a simple full intensity RGB set of 7 colours.
Obviously, this means that the 1/8 of the transformations are pretty useless
as they will be rendered black.

-----------------------------------------------------------------------------

Text buffer
-----------

There is an 8x8 character text screen overlaid on top of the main screen, fixed
to a brightish white with a hint of the underneath showing through.

The text mode resolution is 46*32 for PAL and 46*26 for NTSC, addressed with
64 characters per line so that the address of a character is (y<<6)+x.

The output of the text RAM is fed to the high bits of a font RAM, the lower
3 bits come from the low order bits of the pixel line counter so that an 8*8
character is produced.

In this demo, tunk wanted to make a 16*8 font, so we split it into 2 halves -
characters 00-7F contain the top half and 80-FF contain the bottom half. The
python script handles this automatically and draws both halves of a string.

-----------------------------------------------------------------------------

Data streaming
--------------

The data is accepted via a byte interface and fed into a ring buffer. This ring
buffer is slightly unusual in that it relies on the dual-ported BRAM in a
Xilinx so that whilst data in is 8-bit, data out is 16-bit address and data.

On my CPCFPGA board, I feed this via an SPI interface from an Atmega32u2.

-----------------------------------------------------------------------------

Data file generation
--------------------

The python files attached provide a reasonably object-based system for
defining and animating the fractals. However, the majority of the effects for
the demo were written in the last few hours on the day of the demo competition
and so it the code in sundown.py got a bit hacky towards the end of the day!

Enjoy the source!

-----------------------------------------------------------------------------

Music
-----

I've not included the source data for the music, which was written by mr_lou,
originally for the Amstrad CPC. I used my proprietry AY-8192 core, written as
part of my Amstrad emulator to play this back.

Apologies to mr_lou - the version presented at the party was replayed at 60Hz
instead of the 50HZ it was authored at. Unfortunately, I realised my SCART to
HDMI adaptor would display a continual OSD when fed PAL and I didn't want this
for the demo presentation so I made the decision to reauthor the graphics for
the smaller vertical resolution of NTSC.

The tune seems a bit more upbeat at 60Hz, but it does lose some of the clarity
on the chirps and I personally prefer the 50Hz version. It also fits more with
the theme of the demo!

-----------------------------------------------------------------------------

Finally
-------

If you want to use any of this, please give me attribution. At the very least
I plan to release a "final" version of this demo so that the graphics and tune
line up better, as I couldn't really hear the music over the noise when in the
competition hall!

However, I fully expect to revisit some of this again - I started this demo
to explore animated fractals and didn't really have many of the transitions
I imagined (or even calculated coordinates for!)

If you want one of my CPC 2013 boards, I've got enough components to make a
few boards up. They're ideal for 8-bit emulators, which is what I designed it
for.